On Wed, 2026-03-04 at 09:38 -0800, Brian Bunker wrote: > Ben and Martin, > > Sorry for the delayed response. I sent this to you both initially > instead of > to the list because I thought you might have some insight into the > issue. > I have an 'alua' checker that I will post where I found this issue. I > didn't actually > find it in the 'tur' checker but I could see it is prone to the same > issue. I have a > test I created to show the issue called test_tur.c. I will send this > too if it helps. > The problem when I didn't use this patch was that my checker the > entire > checker thread would stall. It didn't just correct itself. Until > other > paths failed > the checker thread was deadlocked somehow. > > In the 'tur' checker without this change, the 'running != 0' will > lead > to PATH_PENDING > and MSG_TUR_RUNNING without looking at what those actual values are. > It might actually > be finished, but running hasn't changed to 0 yet, and ct->thread is > not > 0 on returning. I am not sure what extra conditions were true when my > checker > thread deadlocks, but with this change the deadlock never happens. >
This is a stronger reason to make a change than 1 second gaine21:40 multipatd for a path reinstantiation. IIRC you made an analogous change to the one you sent us for the TUR checker for your ALUA checker, and that solved the deadlock issue for you? It's hard for me to understand with the TUR checker coded in my mind. My guess is that your checker does not work exactly like the TUR checker, and that you have some general other issue. But again, without examining the new checker very carefully and possibly testing, it's impossible to tell. Could you try the approach I sent in my other email instead of yours, and see if it makes a difference? In general, I'm not surprised that you saw deadlocks when trying to write a new checker. As I said previously, we've had our share of them in the past with the TUR checker. This isn't easy to get right. I suggest that you leave the TUR patch aside for now, make your checker work, and send patches for review. Maybe when we see the code we will be able to understand better what's going on. Regards Martin
