[389-devel] Re: System tap performance results,

William Brown Tue, 10 Dec 2019 16:22:24 -0800


> On 10 Dec 2019, at 19:15, thierry bordaz <[email protected]> wrote:
> 
> Hi William,
> 
> Thanks for these very interesting results.
> It would help if you can provide the stap scripts to make sure what you are 
> accounting the latency.


Yes, I plan to put them into a PR soon once I have done a bit more data 
collection and polishing of the script setup. 

> Also just to be sure is latency a synonym for response time ?

Yep, here I mean the execution time of a single operation. 

> 
> Regarding the comparison (tail) 1client/16client. It looks to me that the 
> tail are equivalent: The more we have clients the more we have long latency. 
> So in a first approach I would eliminate contention effect.

I disagree, the tail is much more pronounced in the 16 client version, and 
there is a significant shift of response times from the 32768 bucket to 65536 
indicating that many operations are now being "held up". 

> 
> Regarding the ratio shorter/longer latency (assuming the search are 
> equivalent) this is interesting to know why we have such effect. One of the 
> possible cause I was thinking of is the impact of DB thread (checkpointing or 
> trickling). But if it exists similar long tail in ldbm_back, the absolute 
> value is much lower than the opshare_search: in short ldbm_back accounted at 
> most for 4ms while opshared for 67ms. So there is something else (aci, 
> network, frontend..).
> 
> Regarding USDT I think it is very good idea. However, just to share some 
> recent stap experience, I found it intrusive. In short, I had not the same 
> throughput with and without. In my case it was not a problem, as I wanted to 
> investigate a reproducible contention. But if we want support/user/customers 
> to use it, the performance impact in production will be valid point.

I haven't noticed any "intrusiveness" from USDT so far? How were you running 
the stap scripts? 

> 
> best regards
> thierry
> 
> 
> 
> On 12/9/19 6:16 AM, William Brown wrote:
>> Hi all,
>> 
>> Following last weeks flamegraph runs, I wanted to find more details on 
>> exactly what was happening. While flamegraphs highlighted that a changed 
>> existed between single and multithreaded servers, it did not help to isolate 
>> where
>> the change was occuring.
>> 
>> To understand this I have started to work on a set of systemtap scripts that 
>> we can use to profile 389ds. These will be included in a future PR.
>> 
>> Here are the hisograms from an initial test of profiling "do_search"
>> 
>> 1 thread
>> 
>> stap test_do_search.stp
>> ^CDistribution of latencies (in nanoseconds) for 441148 samples
>> max/avg/min: 35911542/85694/38927
>>     value |-------------------------------------------------- count
>>      8192 |                                                        0
>>     16384 |                                                        0
>>     32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 167285
>>     65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  239280
>>    131072 |@@@@@                                               25788
>>    262144 |@                                                    8530
>>    524288 |                                                      252
>>   1048576 |                                                        6
>>   2097152 |                                                        1
>>   4194304 |                                                        3
>>   8388608 |                                                        0
>>  16777216 |                                                        2
>>  33554432 |                                                        1
>>  67108864 |                                                        0
>> 134217728 |                                                        0
>> 
>> 16 thread
>> 
>>  stap test_do_search.stp
>> ^CDistribution of latencies (in nanoseconds) for 407806 samples
>> max/avg/min: 100315928/112407/39368
>>     value |-------------------------------------------------- count
>>      8192 |                                                        0
>>     16384 |                                                        0
>>     32768 |@@@@@@@@@@@@@@@@@@@@                               100281
>>     65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  249656
>>    131072 |@@@@@@@                                             37837
>>    262144 |@@@                                                 18322
>>    524288 |                                                     1171
>>   1048576 |                                                      203
>>   2097152 |                                                       90
>>   4194304 |                                                       74
>>   8388608 |                                                       83
>>  16777216 |                                                       58
>>  33554432 |                                                       21
>>  67108864 |                                                       10
>> 134217728 |                                                        0
>> 268435456 |                                                        0
>> 
>> 
>> It's interesting to note the tail latency here: On the 16 thread version we 
>> see 67000 less in the 32768 buckets, shifting mostly through the 131072 and 
>> 262144 buckets, as well as showing a much greater number of calls in the 
>> tail. In thread 1 no operation made it to 67108864, but 10 did in 16thread, 
>> along with ~200 more that are higher than 1048567, and ~1500 more that are 
>> greater than 524288. This kind of tailing means we have "spikes" of latency 
>> throughout the execution, which then have a minor flow on cause other 
>> operations to be increased in latency.
>> 
>> These are all in nanoseconds, so this means most operations are in do_search 
>> for 0.000131072 seconds or less, while there are spikes of operations taking 
>> between nearly 0.001048576 and 0.067108864 seconds to complete (which is an 
>> 8x to 512x increase in execution time.
>> 
>> From these point I took two measurements of back_ldbm and the access log, 
>> knowing these were easy targets for potential areas of latency. Both of 
>> these were performed with the 16thread server. I didn't need to do this on 
>> the 1 thread because I'm looking for tail latency, not comparisons now. 
>> Because we know that there is an increase of ~1500 events of high latency, 
>> we are looking for similar numbers to appear in other tail latencies.
>> 
>> 
>> ^CDistribution of back_ldbm latencies (in nanoseconds) for 364223 samples
>> max/avg/min: 4757744/31192/13622
>>    value |-------------------------------------------------- count
>>     2048 |                                                        0
>>     4096 |                                                        0
>>     8192 |@@                                                  14318
>>    16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  261265
>>    32768 |@@@@@@@@@@@@                                        63533
>>    65536 |@@@                                                 20624
>>   131072 |                                                     4062
>>   262144 |                                                      304
>>   524288 |                                                       68
>>  1048576 |                                                       33
>>  2097152 |                                                       15
>>  4194304 |                                                        1
>>  8388608 |                                                        0
>> 16777216 |                                                        0
>> 
>> This shows some tail latency in back_ldbm however it's infrequent, only 
>> showing a handful of spikes, but they do exist and tend to taper off quickly 
>> as they approach 2097152. Certainly these should be looked at as they will 
>> likely add up to affecting do_search.
>> 
>> 
>> stap -v test_access_log.stp
>> ^CDistribution of slapi_log_access latencies (in nanoseconds) for 1284350 
>> samples
>> max/avg/min: 23843992/5138/1036
>>    value |-------------------------------------------------- count
>>      256 |                                                        0
>>      512 |                                                        0
>>     1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     379710
>>     2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        361243
>>     4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  412480
>>     8192 |@@@@@@@@@@                                          83815
>>    16384 |@@@@                                                40087
>>    32768 |                                                     5076
>>    65536 |                                                     1253
>>   131072 |                                                      406
>>   262144 |                                                      130
>>   524288 |                                                       72
>>  1048576 |                                                       33
>>  2097152 |                                                       16
>>  4194304 |                                                       19
>>  8388608 |                                                        1
>> 16777216 |                                                        9
>> 33554432 |                                                        0
>> 67108864 |                                                        0
>> 
>> 
>> From this, I can see that while some tail latency exists in some conditions 
>> of back_ldbm, there is a much longer and present tail in the access log, 
>> with nearly 2000 operations in or exceeding the 65536 mark - remember, in 
>> do_search most of the operations as a whole take 65536, so we have latency 
>> spikes in slapi_log_access that are as large or larger than whole search 
>> operations as a total.
>> 
>> Now we can see "most! operations are fimly below 16384. This is what we want 
>> to see, but it's interesting that a slapi_log_access can "take up to" a 
>> quarter of a whole operations processing on the avg case.
>> 
>> What's truly telling here is the tail down to 16777216, with ~250 operations 
>> exceeding 262144 ns. This shows that within slapi_log_access, we have have 
>> large spiked delays in the behaviour, which will be affecting both the tail 
>> latency of do_search as a whole, but also causing other slapi_log_access 
>> operations to "hold up".
>> 
>> My next steps from here will be:
>> 
>> * To add USDT probes to the logs and back_ldbm to get better, fine detail 
>> about what is causing the excessive latency.
>>   * these probes are also needed to resolve what appears to be a symbol 
>> resolution issue with systemtap when optimisations are enabled.
>> * To add probes in other parts of the code base to get better visualisation 
>> about where and how long timings are taking through an operation.
>> * To run a lock contention profile (I was unable to do this today due to 
>> bugs in systemtap)
>> * To document the setup proceedures
>> * Commit all these into a PR
>> * Document some actionable improvements we can make to improve the server 
>> performance.
>> 
>> 
>> Hope that is interesting to everyone,
>> 
>> 
>> --
>> Sincerely,
>> 
>> William
>> _______________________________________________
>> 389-devel mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> Fedora Code of Conduct: 
>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives: 
>> https://lists.fedoraproject.org/archives/list/[email protected]
> 

--
Sincerely,

William
_______________________________________________
389-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

[389-devel] Re: System tap performance results,

Reply via email to