Hi Pieter, On Tue, Jun 11, 2019 at 12:38:42AM +0200, PiBa-NL wrote: > After a bit more fiddling i noticed that the new startup method seems more > CPU intensive. > Also it can be seen the vtest does take a bit longer to pass 1.3sec v.s. > 0.8sec even though the health-check durations themselves are short as > expected. Also its using quite a bit more 'user' cpu. I was wondering if > this is a consequence of the new init sequence, or perhaps some improvement > is still needed there.? I noticed this after trying to run multiple tests > simultaneously again they interfered more with each-other then they used > to.. > > 2.0-dev6-e4d7c9d 2019/06/10 > ** h1 1.279 WAIT4 pid=63820 status=0x0002 (user 4.484293 sys 0.054781) > ** h2 1.394 WAIT4 pid=63823 status=0x0002 (user 4.637692 sys 0.015588) > # top TEST ./test/tls_health_checks-org.vtc passed (1.395) > > Before the latest changes it used less 'user': > 2.0-dev6-e186161 2019/06/07 > ** h1 0.783 WAIT4 pid=65811 status=0x0002 (user 1.077052 sys 0.031218) > ** h2 0.897 WAIT4 pid=65814 status=0x0002 (user 0.341360 sys 0.037928) > # top TEST ./test/tls_health_checks-org.vtc passed (0.899)
Hmmm that's a bit annoying. Not dramatic but annoying. > And with 'nbthread 1' the user cpu usage is even more dramatically lower > with the same test.. > 2.0-dev6-e4d7c9d 2019/06/10 > ** h1 0.684 WAIT4 pid=67990 status=0x0002 (user 0.015203 sys 0.015203) > ** h2 0.791 WAIT4 pid=67993 status=0x0002 (user 0.013551 sys 0.009034) > # top TEST ./test/tls_health_checks-org.vtc passed (0.793) > > 2.0-dev6-e186161 2019/06/07 > ** h1 0.682 WAIT4 pid=65854 status=0x0002 (user 0.007158 sys 0.021474) > ** h2 0.790 WAIT4 pid=65857 status=0x0002 (user 0.007180 sys 0.014361) Here I don't understand how a single thread can have changed anything given that there's no other thread to wait for because all that changed was to add some inter-thread synchronization during start up so that threads don't start using resources that other threads have not yet initialized (typically listening FDs). > If a single threaded haproxy process can run with 0.015 user-cpu-usage, i > would not have expected it to required 4.4 on a 16 core cpu for the same > startup&actions. I agree. > Where it should be easier to spawn a second thread with the > already parsed config instead of more expensive.?. "In theory" it should not. In practice the theory seems wrong. I've run all the tests with nbthread 64 on my 8-thread machine to force conflicts and be sure there was no degradation :-/ Just out of curiosity, is it running in a VM on your side ? > Even if it parses the > config once in each thread separately it doesn't make sense to me. It doesn't make sense to me either at this point, unless maybe the CPU is stolen for a long time and lets other threads spin. In this case I should probably switch to heavier synchronization mechanism like a mutex that will allow the waiting threads to sleep during this time. While it's not reasonable on the data path, it's OK during startup. > So i thought also to try with 'nbthread 8' and that still seems to be > 'alright' as seen below.. so i guess with the default of nbthread 16 the h1 > and h2 get into some conflict fighting over the available cores.??. Yes definitely, eventhough I tried to trigger this and didn't notice anything alarming. > And > haproxy by default will use all cores since 2.0-dev3 so i guess it might > cause some undesirable effects in the field once it gets released and isn't > the only process running on a machine, and even if it is the only intensive > process, i wonder what other VM's might think about it on the same > hypervisor, though i know VM's always give 'virtual performance' ;) .. OK so these are VMs, this can definitely magnify the wait time. > Running with nbthread 8, still relatively low user usage & test time: > 2.0-dev6-e4d7c9d 2019/06/10 > ** h1 0.713 WAIT4 pid=68467 status=0x0002 (user 0.197443 sys 0.022781) > ** h2 0.824 WAIT4 pid=68470 status=0x0002 (user 0.184567 sys 0.026366) > # top TEST ./test/tls_health_checks-org.vtc passed (0.825) > > Hope you can make sense of some of this. Sorry for not noticing earlier, i > guess i was to focused at only the health-check-duration. Or maybe its just > me interpreting the numbers wrongly, that's surely also an option. I don't think you're interpreting anything wrongly as you manage to spot things I failed to spot. Don't be sorry, all of this is pretty useful. I'm going to have a look at this this morning. I now see how to make things worse to observe the changes, I suspect that forcing a high nbthread and binding all of them to a single CPU should reveal the issue much better. Thanks, Willy

