Thank you for reviewing Shakeel, > Do we need to trace highest_zoneidx at the end? Can it change within > balance_pgdat()?
highest_zoneidx does not change within a balance_pgdat() invocation. It is passed in as an argument and remains the classzone bound used for the balancing checks throughout the function. I kept highest_zoneidx in the end tracepoint to make the outcome event self-contained. In principle, begin/end correlation is possible, but under sustained memory pressure kswapd reclaim can be frequent enough that consumers may prefer to analyze end events directly, and any dependence on matching begin/end becomes less convenient and less robust in the presence of filtering or dropped trace records. Since nr_reclaimed and the final order are only known at the end, having highest_zoneidx there allows end-only analysis without correlating with the begin event. For example, it lets users answer questions like: - this pass reclaimed too much or too little memory; what highest_zoneidx did that result correspond to? - how much reclaim was done when balancing up to ZONE_NORMAL vs other classzone bounds? - when highest_zoneidx == ZONE_NORMAL, how often did reclaim finish at order=0? So it is there because it provides context for the end-of-reclaim result. Do you think this is sufficient justification? If not, then I can drop it from the end tracepoint in v2. ----- Original Message ----- From: "Shakeel Butt" <[email protected]> To: "Bunyod Suvonov" <[email protected]> Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], "zhengqi arch" <[email protected]>, [email protected], "mathieu desnoyers" <[email protected]>, [email protected], [email protected], [email protected] Sent: Friday, April 24, 2026 1:46:55 AM Subject: Re: [PATCH] mm/vmscan: add balance_pgdat begin/end tracepoints On Thu, Apr 23, 2026 at 06:37:53PM +0800, Bunyod Suvonov wrote: > Vmscan has six main reclaim entry points: try_to_free_pages() for > direct reclaim, try_to_free_mem_cgroup_pages() for memcg reclaim, > mem_cgroup_shrink_node() for memcg soft limit reclaim, node_reclaim() > for node reclaim, shrink_all_memory() for hibernation reclaim, and > balance_pgdat() for kswapd reclaim. > > All of them, except for shrink_all_memory() and balance_pgdat(), already > have begin/end tracepoints. This makes it harder to trace which reclaim > path is responsible for memory reclaim activity, because kswapd reclaim > cannot be identified as cleanly as other reclaim entry points, even > though it is the main background reclaim path under memory pressure. > There may be no need to trace shrink_all_memory() as it is primarily > used during hibernation. So this patch adds the missing tracepoint pair > for balance_pgdat(). > > The begin tracepoint records the node id, requested reclaim order, and > highest_zoneidx. The end tracepoint records the node id, reclaim order > that balance_pgdat() finished with, highest_zoneidx, and nr_reclaimed. Do we need to trace highest_zoneidx at the end? Can it change within balance_pgdat()? > Together, they show the requested reclaim order and zone bound, whether > reclaim fell back to a lower order, and how much reclaim work was done. > > Signed-off-by: Bunyod Suvonov <[email protected]> Overall looks good.
