Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection

Martin Wilck Fri, 27 Feb 2026 00:37:03 -0800

On Thu, 2026-02-26 at 13:48 -0500, Benjamin Marzinski wrote:
> On Wed, Feb 25, 2026 at 11:00:44PM +0100, Martin Wilck wrote:
> > 
> > I'm still undecided if it's worth changing this code in order to
> > reinstate some paths 1s earlier in certain sitautions.
> > 
> > Thoughts?
> 
> I don't see any problems with this. But I also didn't see any real
> problems with Brian's version (although this version does remove most
> of
> my theoretic issues. Like I said before. If we hang in the condlog,
> we've got bigger problems than a stray checker thread). If we do
> this,
> we should probably document the uatomic_add_return() call,


Of course, this was not an official patch submission, just an idea how
we might improve this code. It's not exactly the same thing; Brian's
patch meant simply ignoring ct->running, while mine would make the ct-
>running check reliable, albeit with changed semantics.

>  or just use
> cmm_smp_mb() directly. But I still not sure what we get out of this,
> other than a very slight chance to finish the checker a second
> sooner.

Yes. But it's also a fact that the simple atomic_read() test in the
current code isn't reliable. It can well return true when ct->running
has already been set to false. I guess the implementation tends to be
as light-weight as possible, avoiding a memory barrier (or taking a
lock with the side effect of a memory barrier) in the "fast path". That
comes with the price that we may miss an already finished checker
thread.

> don't think that really makes a difference, but I don't really see
> much
> risk in the patch either, so I guess I'm fine either way.

If the risk was truly zero, I would agree. But experience tells that
it's very easy to get these things wrong for corner cases, and I tend
to want to avoid even a minimal risk just for "very slight chance to
finish the checker a second sooner", as you've expressed it very
nicely.

@Brian, feel free to try to convince us otherwise. It might be helpful
if we understood your use case better.

Martin

Re: [PATCH 0/1] checkers: fix race in async TUR and ALUA result collection

Reply via email to