On 12/11/19 1:21 AM, William Brown wrote:

On 10 Dec 2019, at 19:15, thierry bordaz <tbor...@redhat.com> wrote:

Hi William,

Thanks for these very interesting results.
It would help if you can provide the stap scripts to make sure what you are 
accounting the latency.
Yes, I plan to put them into a PR soon once I have done a bit more data 
collection and polishing of the script setup.

Also just to be sure is latency a synonym for response time ?
Yep, here I mean the execution time of a single operation.

Regarding the comparison (tail) 1client/16client. It looks to me that the tail 
are equivalent: The more we have clients the more we have long latency. So in a 
first approach I would eliminate contention effect.
I disagree, the tail is much more pronounced in the 16 client version, and there is a 
significant shift of response times from the 32768 bucket to 65536 indicating that many 
operations are now being "held up".

You are probably right.
There is a response time shift but I would expect a major contention effect to shift more and flatter the graph to upper response time. Whatever is the answer, I think an important point is to agree on the method and that reports shows somethings suspicious.


Regarding the ratio shorter/longer latency (assuming the search are equivalent) 
this is interesting to know why we have such effect. One of the possible cause 
I was thinking of is the impact of DB thread (checkpointing or trickling). But 
if it exists similar long tail in ldbm_back, the absolute value is much lower 
than the opshare_search: in short ldbm_back accounted at most for 4ms while 
opshared for 67ms. So there is something else (aci, network, frontend..).

Regarding USDT I think it is very good idea. However, just to share some recent 
stap experience, I found it intrusive. In short, I had not the same throughput 
with and without. In my case it was not a problem, as I wanted to investigate a 
reproducible contention. But if we want support/user/customers to use it, the 
performance impact in production will be valid point.
I haven't noticed any "intrusiveness" from USDT so far? How were you running 
the stap scripts?
I did not add probe in DS. I was just using stap to gather per operation specific functions duration (like plugin, backend or core).

I do not recall the "intrusiveness" level as I was more looking for contention data than performance value.


best regards
thierry



On 12/9/19 6:16 AM, William Brown wrote:
Hi all,

Following last weeks flamegraph runs, I wanted to find more details on exactly 
what was happening. While flamegraphs highlighted that a changed existed 
between single and multithreaded servers, it did not help to isolate where
the change was occuring.

To understand this I have started to work on a set of systemtap scripts that we 
can use to profile 389ds. These will be included in a future PR.

Here are the hisograms from an initial test of profiling "do_search"

1 thread

stap test_do_search.stp
^CDistribution of latencies (in nanoseconds) for 441148 samples
max/avg/min: 35911542/85694/38927
     value |-------------------------------------------------- count
      8192 |                                                        0
     16384 |                                                        0
     32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 167285
     65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  239280
    131072 |@@@@@                                               25788
    262144 |@                                                    8530
    524288 |                                                      252
   1048576 |                                                        6
   2097152 |                                                        1
   4194304 |                                                        3
   8388608 |                                                        0
  16777216 |                                                        2
  33554432 |                                                        1
  67108864 |                                                        0
134217728 |                                                        0

16 thread

  stap test_do_search.stp
^CDistribution of latencies (in nanoseconds) for 407806 samples
max/avg/min: 100315928/112407/39368
     value |-------------------------------------------------- count
      8192 |                                                        0
     16384 |                                                        0
     32768 |@@@@@@@@@@@@@@@@@@@@                               100281
     65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  249656
    131072 |@@@@@@@                                             37837
    262144 |@@@                                                 18322
    524288 |                                                     1171
   1048576 |                                                      203
   2097152 |                                                       90
   4194304 |                                                       74
   8388608 |                                                       83
  16777216 |                                                       58
  33554432 |                                                       21
  67108864 |                                                       10
134217728 |                                                        0
268435456 |                                                        0


It's interesting to note the tail latency here: On the 16 thread version we see 67000 
less in the 32768 buckets, shifting mostly through the 131072 and 262144 buckets, as well 
as showing a much greater number of calls in the tail. In thread 1 no operation made it 
to 67108864, but 10 did in 16thread, along with ~200 more that are higher than 1048567, 
and ~1500 more that are greater than 524288. This kind of tailing means we have 
"spikes" of latency throughout the execution, which then have a minor flow on 
cause other operations to be increased in latency.

These are all in nanoseconds, so this means most operations are in do_search 
for 0.000131072 seconds or less, while there are spikes of operations taking 
between nearly 0.001048576 and 0.067108864 seconds to complete (which is an 8x 
to 512x increase in execution time.

 From these point I took two measurements of back_ldbm and the access log, 
knowing these were easy targets for potential areas of latency. Both of these 
were performed with the 16thread server. I didn't need to do this on the 1 
thread because I'm looking for tail latency, not comparisons now. Because we 
know that there is an increase of ~1500 events of high latency, we are looking 
for similar numbers to appear in other tail latencies.


^CDistribution of back_ldbm latencies (in nanoseconds) for 364223 samples
max/avg/min: 4757744/31192/13622
    value |-------------------------------------------------- count
     2048 |                                                        0
     4096 |                                                        0
     8192 |@@                                                  14318
    16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  261265
    32768 |@@@@@@@@@@@@                                        63533
    65536 |@@@                                                 20624
   131072 |                                                     4062
   262144 |                                                      304
   524288 |                                                       68
  1048576 |                                                       33
  2097152 |                                                       15
  4194304 |                                                        1
  8388608 |                                                        0
16777216 |                                                        0

This shows some tail latency in back_ldbm however it's infrequent, only showing 
a handful of spikes, but they do exist and tend to taper off quickly as they 
approach 2097152. Certainly these should be looked at as they will likely add 
up to affecting do_search.


stap -v test_access_log.stp
^CDistribution of slapi_log_access latencies (in nanoseconds) for 1284350 
samples
max/avg/min: 23843992/5138/1036
    value |-------------------------------------------------- count
      256 |                                                        0
      512 |                                                        0
     1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     379710
     2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        361243
     4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  412480
     8192 |@@@@@@@@@@                                          83815
    16384 |@@@@                                                40087
    32768 |                                                     5076
    65536 |                                                     1253
   131072 |                                                      406
   262144 |                                                      130
   524288 |                                                       72
  1048576 |                                                       33
  2097152 |                                                       16
  4194304 |                                                       19
  8388608 |                                                        1
16777216 |                                                        9
33554432 |                                                        0
67108864 |                                                        0


 From this, I can see that while some tail latency exists in some conditions of 
back_ldbm, there is a much longer and present tail in the access log, with 
nearly 2000 operations in or exceeding the 65536 mark - remember, in do_search 
most of the operations as a whole take 65536, so we have latency spikes in 
slapi_log_access that are as large or larger than whole search operations as a 
total.

Now we can see "most! operations are fimly below 16384. This is what we want to see, but 
it's interesting that a slapi_log_access can "take up to" a quarter of a whole 
operations processing on the avg case.

What's truly telling here is the tail down to 16777216, with ~250 operations exceeding 
262144 ns. This shows that within slapi_log_access, we have have large spiked delays in 
the behaviour, which will be affecting both the tail latency of do_search as a whole, but 
also causing other slapi_log_access operations to "hold up".

My next steps from here will be:

* To add USDT probes to the logs and back_ldbm to get better, fine detail about 
what is causing the excessive latency.
   * these probes are also needed to resolve what appears to be a symbol 
resolution issue with systemtap when optimisations are enabled.
* To add probes in other parts of the code base to get better visualisation 
about where and how long timings are taking through an operation.
* To run a lock contention profile (I was unable to do this today due to bugs 
in systemtap)
* To document the setup proceedures
* Commit all these into a PR
* Document some actionable improvements we can make to improve the server 
performance.


Hope that is interesting to everyone,


--
Sincerely,

William
_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org
--
Sincerely,

William

_______________________________________________
389-devel mailing list -- 389-devel@lists.fedoraproject.org
To unsubscribe send an email to 389-devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org

Reply via email to