Hi Alexander,

We had a bunch of issues to fix in the master-worker and I also took the
opportunity to take a look at your patch this morning.

I started to write a small regression test suite for the master-worker because
it's kind of difficult to test it with our current vtest suite. I still need to
finish it before pushing but that will help us with that kind of issues.

I also agree that having an ascending order would be more logical, we broke
that by accident with the 3.1 rework.

I pushed your patch in master and will backport it in previous versions.


On Thu, Mar 19, 2026 at 12:34:38PM +0000, Stephan, Alexander wrote:
> Subject: RE: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc'
> Hi,
> 
> Did you have a chance to take a look yet? Thanks!
> 
> Best,
> 
> Alexander
> 
> -----Original Message-----
> From: Stephan, Alexander
> Sent: Friday, February 20, 2026 10:11 AM
> To: [email protected]
> Subject: FW: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc'
> 
> Now with list in CC, sry!
> 
> -----Original Message-----
> From: Stephan, Alexander
> Sent: Thursday, February 19, 2026 2:22 PM
> To: 'William Lallemand' <[email protected]>
> Cc: Willy Tarreau <[email protected]>
> Subject: RE: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc'
> 
> Hi William and Willy,
> 
> First, thanks for the backports! However, I noticed an issue with the 
> backport on stable branches <= 3.0 where the entries get re-emitted after 198 
> processes.
> 
> The final proc_list ordering differs between the versions:
> 
> <= 3.0: In haproxy.c main(), the new worker is appended to proc_list first 
> (before mworker_env_to_proc_list() is called), so deserialized old workers 
> end up after the current worker.
> This would mean old (LEAVING) workers appear in the list in descending reload 
> order (newest old worker first).
> 
> >= 3.1 (current master): mworker_env_to_proc_list() runs early in init 
> >(haproxy.c:3311), appending all deserialized entries first.
> Then mworker_prepare_master() (haproxy.c:3350) appends the new worker last. 
> Old workers are therefore in ascending reload order (oldest first).
> 
> The original pagination fix (commit 594408c) used child->reloads >= 
> ctx->next_reload to skip already-printed entries on resume.
> This comparison is direction-dependent — it only works correctly when the 
> list is in one specific order.
> On branches where the order is reversed (i.e. <= 3.0, so effectively 2.9, 
> 3.0, since the other patch was only backported to this version), it loops 
> back to the beginning after ~198 processes.
> 
> A version-specific fix, applicable to 2.9 and 3.0 would be trivial (flip >= 
> to <), created from 3.0.16 in this case:
> 
> diff --git a/src/mworker.c b/src/mworker.c index 355e015..fb830fc 100644
> --- a/src/mworker.c
> +++ b/src/mworker.c
> @@ -638,8 +638,8 @@ static int cli_io_handler_show_proc(struct appctx *appctx)
>                         if (up <= 0) /* must never be negative because of 
> clock drift */
>                                 up = 0;
> 
> -                       /* If we're resuming, skip entries that were already 
> printed (reload >= ctx->next_reload) */
> -                       if (ctx->next_reload && child->reloads >= 
> ctx->next_reload)
> +                       /* If we're resuming, skip entries that were already 
> printed (reload < ctx->next_reload) */
> +                       if (ctx->next_reload && child->reloads < 
> ctx->next_reload)
>                                 continue;
> 
>                         if (!(child->options & PROC_O_TYPE_WORKER))
> --
> 2.43.7
> 
> But as we've seen, this approach is fragile and will depend on the specific 
> HAProxy version.
> From what I understood, the backported commits should be apply to all target 
> versions as is.
> 
> So, instead of a directional comparison, the new approach:
> 
> I renamed ctx->next_reload to ctx->resume_reload and changed its semantics: 
> it now stores the reload count of the last successfully flushed row, not the 
> failed one.
> On flush failure, the value is left unchanged, so the failed row is replayed 
> on the next call.
> On resume, walk the list and skip entries until we find the one whose 
> child->reloads == skip, then resume from the next entry.
> This works identically whether the list is ascending or descending.
> 
> For correctness, we also need to handle the deletion of the marker entry 
> (e.g., a worker process exits and SIGCHLD removes it from proc_list between 
> handler calls).
> Here, track the previous LEAVING entry's reload count during the skip phase. 
> If two consecutive LEAVING entries straddle the skip value (one has reloads > 
> skip, the other has reloads < skip, or vice versa), the deleted entry's 
> former position has been crossed, so skipping stops and the current entry is 
> emitted.
> 
> This makes the pagination completely direction-agnostic — it doesn't matter 
> whether old workers are in ascending or descending order in proc_list.
> 
> IMO, it should be backported to all stable branches. On branches where 
> proc_list happens to be in descending order (2.9, 3.0), the fix works the 
> same way since the skip logic no longer depends on list direction.
> 
> What do you think of this? In general, I would find a stable ascending order 
> in >= 3.1 that is consistent to 3.0 ideal. This would also prevent tools that 
> use the output from breaking.
> But I am not sure about the implications of changing this again.
> As usual, I attach my patch to make sure my email client doesn't mess with it.
> 
> Best regards,
> 
> Alexander
> 
> 
> -----Original Message-----
> From: William Lallemand <[email protected]>
> Sent: Wednesday, January 28, 2026 3:42 PM
> To: Stephan, Alexander <[email protected]>
> Cc: [email protected]
> Subject: Re: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc'
> 
> Hello Alexander,
> 
> On Mon, Jan 26, 2026 at 04:16:56PM +0000, Stephan, Alexander wrote:
> > Subject: RE: BUG/MINOR: mworker/cli: avoid duplicates in 'show proc'
> > Hi William,
> >
> > I have been investigating the backport situation a bit and I wanted to 
> > quickly share some findings that might be useful.
> >
> 
> I backported the patches to version 3.2, but as you already observed, there's 
> a lot of changes before this version. The "program" section still exists in 
> these versions and the code was not written for it.
> 
> I backported a few patches to do the backport as far as 3.0 but I don't think 
> it's worth it going further given the impact of the problem. We probably 
> don't want to mess with a feature which works for 99% of usecases in an older 
> branch.
> 
> > So, I noticed a difference regarding the process list before version 3.0, 
> > which would need to be accounted for.
> > "show proc" still uses the default parser here, which leaves
> > appctx->svcctx uninitialized:
> > https://gith/
> > ub.com%2Fhaproxy%2Fhaproxy%2Fblob%2Ff76e73511addd075f556449b0ebf33c9f7
> > a5184b%2Fsrc%2Fmworker.c%23L803C100-L803C117&data=05%7C02%7Calexander.
> > stephan%40sap.com%7C44fb7a032eec4a725fec08de5e7b6d62%7C42f7676cf455423
> > c82f6dc2d99791af7%7C0%7C0%7C639052081403606857%7CUnknown%7CTWFpbGZsb3d
> > 8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiT
> > WFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=P%2B%2Foi%2BYVifCU82eQOCUcC
> > VoPQ5SGh0d8wd74M%2BeBBvI%3D&reserved=0
> > Therefore, additional context would be needed like in the patch below where 
> > also the custom parser method needs to be copied from master.
> >
> > Furthermore, from what I can see the process list works a bit
> > differently when it comes to reloads, using a min counter:
> > https://gith/
> > ub.com%2Fhaproxy%2Fhaproxy%2Fblob%2Ff76e73511addd075f556449b0ebf33c9f7
> > a5184b%2Fsrc%2Fmworker.c%23L174&data=05%7C02%7Calexander.stephan%40sap
> > .com%7C44fb7a032eec4a725fec08de5e7b6d62%7C42f7676cf455423c82f6dc2d9979
> > 1af7%7C0%7C0%7C639052081403631160%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1h
> > cGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIj
> > oyfQ%3D%3D%7C0%7C%7C%7C&sdata=xitlobHSOmFfjcC6uNTAvExXTMuptzybtemtOcqi
> > vD4%3D&reserved=0 With the 1:1 logic from master, I again run into
> > duplicates and messed up ordering in the the "show proc" output.
> >
> > I had success fixing this by adjusting the iteration, replacing
> > ctx->next_reload = child->reloads with ctx->next_reload =
> > child->reloads + 1; and replace
> >
> > if (ctx->next_reload && child->reloads >= ctx->next_reload)
> >       continue;
> > with
> >
> > if (ctx->next_reload && child->reloads < ctx->next_reload)
> >       continue;
> >
> > in the old worker part. I appended the resulting patch which directly 
> > applies to 3.0-dev13 (just as an example, not fully ready for a commit yet).
> >
> > With this, it works very well for me for the backport targets 2.8 and 3.0 
> > (deferred from the issue https://github.com/haproxy/haproxy/issues/3204).
> >
> > I hope this helps, I am also curious whether you agree with my approach 
> > here.
> >
> > Thanks in advance, and best regards,
> 
> Thank you for the investigating the problem. Regarding your patch, we try to 
> backport the patches instead of writing new ones in previous branches. So for 
> the 3.0 branch I backported these ones:
> 
> e41829dda6 BUG/MINOR: mworker/cli: fix show proc pagination using reload 
> counter
> 9cae4b8f28 BUG/MINOR: mworker/cli: 'show proc' is limited by buffer size
> 48d20c79e7 MINOR: mworker/cli: 'show proc debug' for old workers 5b851f94af 
> MINOR: mworker/cli: remove comment line for program when useless 948a4c372b 
> MINOR: mworker/cli: add 'debug' to 'show proc'
> d55d3c6225 CLEANUP: mworker/cli: remove useless variable
> 
> These are minor changes that are not that intrusive for a backport, 
> backporting without them would change too much the original patches, so I 
> took them in 3.0.
> 
> Regards,
> 
> --
> William Lallemand

-- 
William Lallemand


Reply via email to