Hi,

On Fri, Dec 12, 2025 at 06:00:37AM +0100, Willy Tarreau wrote:
> Thanks for your explanations and cleanups. I agree that overall this
> looks cleaner after your patches, but I'll let William review your
> patches as I'm not the best qualified on this area. Howeve, I'm having
> a comment about the issue you reported above. If it indeed happens that
> multiple processes have the same timestamp (two reloads in the same
> second), for me it simply means that this timestamp is not the correct
> enumeration key since it is not a discriminator. And generally speaking,
> timestamps are only usable as a restart point when used in batches, like
> we do in the scheduler or like you're indeed doing in our approach, and
> when they're guaranteed to be monotonic, which is not even the case here
> if system time changes between reloads.

That's a good point, we indeed don't have a monotomic timestamp in there, it
might forgot some processes if the machine timestamp go back in time. But that
won't be the case if we batch when the timestamp is different instead of
greater than the previous one.

> Or can a process disappear from the list during the dump ? (I don't know
> if the list is adjusted when a process quits).

Indeed, processes can disappear of the list in a non-predictable order.

> If so, then we'd need > another index.
> It *seems* to me that the number of reloads is strictly
> monotonous since we never have more than one process per reload so we
> could use the reload counter and start from the highest possible value.

I didn't wanted to use that at first, because "programs" also have a reload
counter, but we just have to filter by worker, that would suffise.

> And if for any reason it causes any remaining problem, then we could
> just add an index in the structure. We're only speaking about 4 bytes
> per process... that's even less than the extra code needed to deal
> with the ambiguity of the timestamp.

We probably don't need that if using the reload field instead of the
timestamps.

> > In theory, there is still a problem left with missing entries if there is a
> > reload during the dump that has been also mentioned by William in the issue,
> > but that is out of scope for now, I think.
> 
> William mentioned this to me, and there's indeed no simple solution to
> this. The reason is simple: you never want a connected CLI to prevent
> an old process from leaving, otherwise any monitoring process that
> remains connected blocks your reloads. We could imagine disconnecting
> only between commands, i.e. at the prompt, but then there's also the
> case where an old process is actively waiting on a command (just think
> about "show events -w" used to wait for new logs). So if my memory serves
> me right, at the moment this is handled by not counting CLI sessions as
> "jobs", i.e. important tasks that prevent the process from quitting.
> Other approaches could be imagined, but here it's clearly visible that
> waiting for new logs is an example of situation where even reaching the
> end of the log buffer is not a sufficient condition for quitting. Some
> might consider that this combined with jobs==0 might possibly work, but
> honestly I don't know if that's sufficient and will always stop. You
> could for example imagine someone accessing the CLI over TLS via a
> local TLS offloader, meaning that you suddenly have one job per CLI
> connection, etc. So this is indeed a much more complex topic that's
> outside the scope of just this command.


The reload in the master does not work like the stopping state of
the worker, it just does the exec() upon a "reload" or SIGUSR2 signal, but does
not try to finish any jobs, since we are not supposed to have jobs to finish in
this process. The connected sockets are all terminated because of the CLOEXEC,
so we're not trying to close properly sockets by waiting the end of things.

We probably could try to use the job counter, but like you mention the "show
events -w", commands like "show sess" could never end, or things like the @@
prefix. This behavior can't be change simply.

But in the meantime, you still are able to check if there's a newline at the
end of the output to ensure you really reach the end of the data.


-- 
William Lallemand


Reply via email to