> On 11 Dec 2019, at 20:29, thierry bordaz <[email protected]> wrote: > > > > On 12/11/19 1:21 AM, William Brown wrote: >> >>> On 10 Dec 2019, at 19:15, thierry bordaz <[email protected]> wrote: >>> >>> Hi William, >>> >>> Thanks for these very interesting results. >>> It would help if you can provide the stap scripts to make sure what you are >>> accounting the latency. >> Yes, I plan to put them into a PR soon once I have done a bit more data >> collection and polishing of the script setup. >> >>> Also just to be sure is latency a synonym for response time ? >> Yep, here I mean the execution time of a single operation. >> >>> Regarding the comparison (tail) 1client/16client. It looks to me that the >>> tail are equivalent: The more we have clients the more we have long >>> latency. So in a first approach I would eliminate contention effect. >> I disagree, the tail is much more pronounced in the 16 client version, and >> there is a significant shift of response times from the 32768 bucket to >> 65536 indicating that many operations are now being "held up". > > You are probably right. > There is a response time shift but I would expect a major contention effect > to shift more and flatter the graph to upper response time. > Whatever is the answer, I think an important point is to agree on the method > and that reports shows somethings suspicious.
In a highly concurrent system like DS we shouldn't be seeing tail latency or spikes, we should be seeing all responses within a really small window especially when the searches are all the "same query" for a single uid. So that's why I'm looking at the spikes first. > >> >>> Regarding the ratio shorter/longer latency (assuming the search are >>> equivalent) this is interesting to know why we have such effect. One of the >>> possible cause I was thinking of is the impact of DB thread (checkpointing >>> or trickling). But if it exists similar long tail in ldbm_back, the >>> absolute value is much lower than the opshare_search: in short ldbm_back >>> accounted at most for 4ms while opshared for 67ms. So there is something >>> else (aci, network, frontend..). >>> >>> Regarding USDT I think it is very good idea. However, just to share some >>> recent stap experience, I found it intrusive. In short, I had not the same >>> throughput with and without. In my case it was not a problem, as I wanted >>> to investigate a reproducible contention. But if we want >>> support/user/customers to use it, the performance impact in production will >>> be valid point. >> I haven't noticed any "intrusiveness" from USDT so far? How were you running >> the stap scripts? > I did not add probe in DS. I was just using stap to gather per operation > specific functions duration (like plugin, backend or core). > > I do not recall the "intrusiveness" level as I was more looking for > contention data than performance value. Hmmm okay. I've found stap extremely fast and it doesn't get in the way, so unless I saw your stap scripts I'm not sure ..... > >> >>> best regards >>> thierry >>> >>> >>> >>> On 12/9/19 6:16 AM, William Brown wrote: >>>> Hi all, >>>> >>>> Following last weeks flamegraph runs, I wanted to find more details on >>>> exactly what was happening. While flamegraphs highlighted that a changed >>>> existed between single and multithreaded servers, it did not help to >>>> isolate where >>>> the change was occuring. >>>> >>>> To understand this I have started to work on a set of systemtap scripts >>>> that we can use to profile 389ds. These will be included in a future PR. >>>> >>>> Here are the hisograms from an initial test of profiling "do_search" >>>> >>>> 1 thread >>>> >>>> stap test_do_search.stp >>>> ^CDistribution of latencies (in nanoseconds) for 441148 samples >>>> max/avg/min: 35911542/85694/38927 >>>> value |-------------------------------------------------- count >>>> 8192 | 0 >>>> 16384 | 0 >>>> 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 167285 >>>> 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 239280 >>>> 131072 |@@@@@ 25788 >>>> 262144 |@ 8530 >>>> 524288 | 252 >>>> 1048576 | 6 >>>> 2097152 | 1 >>>> 4194304 | 3 >>>> 8388608 | 0 >>>> 16777216 | 2 >>>> 33554432 | 1 >>>> 67108864 | 0 >>>> 134217728 | 0 >>>> >>>> 16 thread >>>> >>>> stap test_do_search.stp >>>> ^CDistribution of latencies (in nanoseconds) for 407806 samples >>>> max/avg/min: 100315928/112407/39368 >>>> value |-------------------------------------------------- count >>>> 8192 | 0 >>>> 16384 | 0 >>>> 32768 |@@@@@@@@@@@@@@@@@@@@ 100281 >>>> 65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 249656 >>>> 131072 |@@@@@@@ 37837 >>>> 262144 |@@@ 18322 >>>> 524288 | 1171 >>>> 1048576 | 203 >>>> 2097152 | 90 >>>> 4194304 | 74 >>>> 8388608 | 83 >>>> 16777216 | 58 >>>> 33554432 | 21 >>>> 67108864 | 10 >>>> 134217728 | 0 >>>> 268435456 | 0 >>>> >>>> >>>> It's interesting to note the tail latency here: On the 16 thread version >>>> we see 67000 less in the 32768 buckets, shifting mostly through the 131072 >>>> and 262144 buckets, as well as showing a much greater number of calls in >>>> the tail. In thread 1 no operation made it to 67108864, but 10 did in >>>> 16thread, along with ~200 more that are higher than 1048567, and ~1500 >>>> more that are greater than 524288. This kind of tailing means we have >>>> "spikes" of latency throughout the execution, which then have a minor flow >>>> on cause other operations to be increased in latency. >>>> >>>> These are all in nanoseconds, so this means most operations are in >>>> do_search for 0.000131072 seconds or less, while there are spikes of >>>> operations taking between nearly 0.001048576 and 0.067108864 seconds to >>>> complete (which is an 8x to 512x increase in execution time. >>>> >>>> From these point I took two measurements of back_ldbm and the access log, >>>> knowing these were easy targets for potential areas of latency. Both of >>>> these were performed with the 16thread server. I didn't need to do this on >>>> the 1 thread because I'm looking for tail latency, not comparisons now. >>>> Because we know that there is an increase of ~1500 events of high latency, >>>> we are looking for similar numbers to appear in other tail latencies. >>>> >>>> >>>> ^CDistribution of back_ldbm latencies (in nanoseconds) for 364223 samples >>>> max/avg/min: 4757744/31192/13622 >>>> value |-------------------------------------------------- count >>>> 2048 | 0 >>>> 4096 | 0 >>>> 8192 |@@ 14318 >>>> 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 261265 >>>> 32768 |@@@@@@@@@@@@ 63533 >>>> 65536 |@@@ 20624 >>>> 131072 | 4062 >>>> 262144 | 304 >>>> 524288 | 68 >>>> 1048576 | 33 >>>> 2097152 | 15 >>>> 4194304 | 1 >>>> 8388608 | 0 >>>> 16777216 | 0 >>>> >>>> This shows some tail latency in back_ldbm however it's infrequent, only >>>> showing a handful of spikes, but they do exist and tend to taper off >>>> quickly as they approach 2097152. Certainly these should be looked at as >>>> they will likely add up to affecting do_search. >>>> >>>> >>>> stap -v test_access_log.stp >>>> ^CDistribution of slapi_log_access latencies (in nanoseconds) for 1284350 >>>> samples >>>> max/avg/min: 23843992/5138/1036 >>>> value |-------------------------------------------------- count >>>> 256 | 0 >>>> 512 | 0 >>>> 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 379710 >>>> 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 361243 >>>> 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 412480 >>>> 8192 |@@@@@@@@@@ 83815 >>>> 16384 |@@@@ 40087 >>>> 32768 | 5076 >>>> 65536 | 1253 >>>> 131072 | 406 >>>> 262144 | 130 >>>> 524288 | 72 >>>> 1048576 | 33 >>>> 2097152 | 16 >>>> 4194304 | 19 >>>> 8388608 | 1 >>>> 16777216 | 9 >>>> 33554432 | 0 >>>> 67108864 | 0 >>>> >>>> >>>> From this, I can see that while some tail latency exists in some >>>> conditions of back_ldbm, there is a much longer and present tail in the >>>> access log, with nearly 2000 operations in or exceeding the 65536 mark - >>>> remember, in do_search most of the operations as a whole take 65536, so we >>>> have latency spikes in slapi_log_access that are as large or larger than >>>> whole search operations as a total. >>>> >>>> Now we can see "most! operations are fimly below 16384. This is what we >>>> want to see, but it's interesting that a slapi_log_access can "take up to" >>>> a quarter of a whole operations processing on the avg case. >>>> >>>> What's truly telling here is the tail down to 16777216, with ~250 >>>> operations exceeding 262144 ns. This shows that within slapi_log_access, >>>> we have have large spiked delays in the behaviour, which will be affecting >>>> both the tail latency of do_search as a whole, but also causing other >>>> slapi_log_access operations to "hold up". >>>> >>>> My next steps from here will be: >>>> >>>> * To add USDT probes to the logs and back_ldbm to get better, fine detail >>>> about what is causing the excessive latency. >>>> * these probes are also needed to resolve what appears to be a symbol >>>> resolution issue with systemtap when optimisations are enabled. >>>> * To add probes in other parts of the code base to get better >>>> visualisation about where and how long timings are taking through an >>>> operation. >>>> * To run a lock contention profile (I was unable to do this today due to >>>> bugs in systemtap) >>>> * To document the setup proceedures >>>> * Commit all these into a PR >>>> * Document some actionable improvements we can make to improve the server >>>> performance. >>>> >>>> >>>> Hope that is interesting to everyone, >>>> >>>> >>>> -- >>>> Sincerely, >>>> >>>> William >>>> _______________________________________________ >>>> 389-devel mailing list -- [email protected] >>>> To unsubscribe send an email to [email protected] >>>> Fedora Code of Conduct: >>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/ >>>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >>>> List Archives: >>>> https://lists.fedoraproject.org/archives/list/[email protected] >> -- >> Sincerely, >> >> William >> > _______________________________________________ > 389-devel mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/[email protected] -- Sincerely, William _______________________________________________ 389-devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/[email protected]
