Hi Pieter,

On Sat, Jun 08, 2019 at 06:07:09AM +0200, Willy Tarreau wrote:
> Hi Pieter,
> 
> On Fri, Jun 07, 2019 at 11:32:18PM +0200, PiBa-NL wrote:
> > Hi Willy,
> > 
> > After the commit "6ec902a MINOR: threads: serialize threads initialization"
> > however i have failing / slow health checks in the tls_health_checks.vtc
> > test. Before that the Layer7-OK takes 5ms after this commit the healthcheck
> > takes up to 75ms or even more.. It causes the 20ms connect/server timeouts
> > to also to fail the test fairly often but not always..
> 
> This is very strange, as the modification only involves threads startup.
> Hmmm actually I'm starting to think about a possibility I need to verify.
> I suspect it may happen that a thread manages to finish its initialization
> before others request synchronzation, thus belives it's alone and starts.
> I'm going to have a deeper look at this problem with this in mind. I didn't
> notice the failed check here but I'll hammer it a bit more.

Sorry for the long silence, it was harder than I thought. So I never managed
to reproduce this typical issue, even by adding random delays here and there,
but I managed to see that some threads were starting the event loop before
others were done initializing, which will obviously result in issues such as
missed events that could result in what you observed.

I initially thought I could easily add a synchronization step using the
current two bit fields (and spent my whole week-end writing parallel
algorithms and revisiting all our locking mechanism just because of this).
After numerous failed attempts, I later figured that I needed to represent
more than 4 states per thread and that 2 bits are not enough. Bah... at
least I had fun time... Thus I added a new field and a simple function to
allow the code to start in synchronous steps. We now initialize one thread
at a time, then once they are all initialized we enable the listeners, and
once they are enabled, we start the pollers in all threads. It is pretty
obvious from the traces that it now does the right thing. However since I
couldn't reproduce the health check issue you were facing, I'm interested
in knowing if it's still present with the latest master, as it could also
uncover another issue.

Thanks!
Willy

Reply via email to