Hi Pieter,

On Tue, Jun 11, 2019 at 12:38:42AM +0200, PiBa-NL wrote:
> After a bit more fiddling i noticed that the new startup method seems more
> CPU intensive.
> Also it can be seen the vtest does take a bit longer to pass 1.3sec v.s.
> 0.8sec even though the health-check durations themselves are short as
> expected. Also its using quite a bit more 'user' cpu. I was wondering if
> this is a consequence of the new init sequence, or perhaps some improvement
> is still needed there.? I noticed this after trying to run multiple tests
> simultaneously again they interfered more with each-other then they used
> to..
> 
> 2.0-dev6-e4d7c9d 2019/06/10
> **   h1    1.279 WAIT4 pid=63820 status=0x0002 (user 4.484293 sys 0.054781)
> **   h2    1.394 WAIT4 pid=63823 status=0x0002 (user 4.637692 sys 0.015588)
> #    top  TEST ./test/tls_health_checks-org.vtc passed (1.395)
> 
> Before the latest changes it used less 'user':
> 2.0-dev6-e186161 2019/06/07
> **   h1    0.783 WAIT4 pid=65811 status=0x0002 (user 1.077052 sys 0.031218)
> **   h2    0.897 WAIT4 pid=65814 status=0x0002 (user 0.341360 sys 0.037928)
> #    top  TEST ./test/tls_health_checks-org.vtc passed (0.899)

Hmmm that's a bit annoying. Not dramatic but annoying.

> And with 'nbthread 1' the user cpu usage is even more dramatically lower
> with the same test..
> 2.0-dev6-e4d7c9d 2019/06/10
> **   h1    0.684 WAIT4 pid=67990 status=0x0002 (user 0.015203 sys 0.015203)
> **   h2    0.791 WAIT4 pid=67993 status=0x0002 (user 0.013551 sys 0.009034)
> #    top  TEST ./test/tls_health_checks-org.vtc passed (0.793)
> 
> 2.0-dev6-e186161 2019/06/07
> **   h1    0.682 WAIT4 pid=65854 status=0x0002 (user 0.007158 sys 0.021474)
> **   h2    0.790 WAIT4 pid=65857 status=0x0002 (user 0.007180 sys 0.014361)

Here I don't understand how a single thread can have changed anything given
that there's no other thread to wait for because all that changed was to add
some inter-thread synchronization during start up so that threads don't
start using resources that other threads have not yet initialized (typically
listening FDs).

> If a single threaded haproxy process can run with 0.015 user-cpu-usage, i
> would not have expected it to required 4.4 on a 16 core cpu for the same
> startup&actions.

I agree.

> Where it should be easier to spawn a second thread with the
> already parsed config instead of more expensive.?.

"In theory" it should not. In practice the theory seems wrong. I've run all
the tests with nbthread 64 on my 8-thread machine to force conflicts and be
sure there was no degradation :-/  Just out of curiosity, is it running in
a VM on your side ?  

> Even if it parses the
> config once in each thread separately it doesn't make sense to me.

It doesn't make sense to me either at this point, unless maybe the CPU
is stolen for a long time and lets other threads spin. In this case I
should probably switch to heavier synchronization mechanism like a mutex
that will allow the waiting threads to sleep during this time. While it's
not reasonable on the data path, it's OK during startup.

> So i thought also to try with 'nbthread 8' and that still seems to be
> 'alright' as seen below.. so i guess with the default of nbthread 16 the h1
> and h2 get into some conflict fighting over the available cores.??.

Yes definitely, eventhough I tried to trigger this and didn't notice
anything alarming.

> And
> haproxy by default will use all cores since 2.0-dev3 so i guess it might
> cause some undesirable effects in the field once it gets released and isn't
> the only process running on a machine, and even if it is the only intensive
> process, i wonder what other VM's might think about it on the same
> hypervisor, though i know VM's always give 'virtual performance' ;) ..

OK so these are VMs, this can definitely magnify the wait time.

> Running with nbthread 8, still relatively low user usage & test time:
> 2.0-dev6-e4d7c9d 2019/06/10
> **   h1    0.713 WAIT4 pid=68467 status=0x0002 (user 0.197443 sys 0.022781)
> **   h2    0.824 WAIT4 pid=68470 status=0x0002 (user 0.184567 sys 0.026366)
> #    top  TEST ./test/tls_health_checks-org.vtc passed (0.825)
> 
> Hope you can make sense of some of this. Sorry for not noticing earlier, i
> guess i was to focused at only the health-check-duration. Or maybe its just
> me interpreting the numbers wrongly, that's surely also an option.

I don't think you're interpreting anything wrongly as you manage to spot
things I failed to spot. Don't be sorry, all of this is pretty useful. I'm
going to have a look at this this morning. I now see how to make things
worse to observe the changes, I suspect that forcing a high nbthread and
binding all of them to a single CPU should reveal the issue much better.

Thanks,
Willy

Reply via email to