Re: [Bug 55932] Create a Graphite Listener

sebb Fri, 03 Jan 2014 15:44:40 -0800

On 3 January 2014 23:22, Philippe Mouawad <[email protected]> wrote:
> Thanks Rainer, of great help!
> Regarding graphite, it offers a function to compute percentiles, so we
> could as sebb proposed it send raw results.


No, I was not proposing that.
I did not know that Graphite could do the calculations.
I was proposing that the calculations were done in the back-end thread
before sending to graphite.

The point was to separate the collection and processing of the data.

However it would be nice if graphite supported aggregated data, that
could reduce the amount to be sent to it.

> My concern was about the memory and network impact of sending all sample
> results to backend.

Likewise.

> So should we create what you propose, do you know an existing library that
> implements it already ?

See my reply to Rainer.

StatCalculator does it already.

>
> On Friday, January 3, 2014, Rainer Jung wrote:
>
>> On 03.01.2014 13:57, [email protected] <javascript:;> wrote:
>> > https://issues.apache.org/bugzilla/show_bug.cgi?id=55932
>> >
>> > --- Comment #6 from Sebb <[email protected] <javascript:;>> ---
>> > I have been having a look at the implementation.
>> >
>> > I don't really see that it needs Commons Math; we aleady have
>> StatCalculator
>> > which handles percentiles and more.
>> >
>> > Likewise, does it really need Commons Pool?
>> > It seems wrong to have to have 2 separate pools of SocketOutputStream
>> > instances.
>> > How many of these would there be?
>> >
>> > Also, DescriptiveStatistics is not thread-safe (nor is StatCalculator).
>> >
>> > If we do implement something like this, I think the data processing needs
>> > either to be carefully synchronised, or the raw data should be sent to a
>> > separate singleton background thread.
>>
>> FWIW: I always get a bit nervous when percentiles are calculated.
>> Percentiles are expensive to calculate if one needs exact results with
>> given percentage numbers (50%, 99%, 99.9% etc.). In that case one needs
>> to keep all values as an ordered list to calculate the percentiles. For
>> a long running test that would be expensive in terms of memory but also
>> in terms of CPU (sorting). There's no way of exactly merging percentiles
>> from interim statistical data.
>>
>> Sometimes approximations are enough. By approximation I don't mean
>> estimated data, but percentages which are not exactly the ones you are
>> keen for. E.g. you would get a 48% value instead of a 50% value, or a
>> 99.02% value instead of a 99% value.
>>
>> Suppose you would know (configure) that only very few samples will take
>> longer than 1000ms, then one could create fixed bins for e.g. 10ms,
>> 15ms, 20ms, 25ms, 30ms, 40ms, 50ms, 75ms, 100ms, 150ms, 200ms, 250ms,
>> 300ms, 400ms, 500ms, 750ms and 1000ms. Now whenever a sample finishes
>> you count the sample in the bin it belongs to and do not save the data
>> (of course you can still log it). At any time you can now look at the
>> bin counters and cheaply produce quantiles. For example suppose the bin
>> counters look like that:
>>
>> Duration binCount
>>   10ms     3
>>   15ms     2
>>   20ms     5
>>   25ms    10
>>   30ms     8
>>   40ms    20
>>   50ms    28
>>   75ms   100
>>  100ms   230
>>  150ms   610
>>  200ms   780
>>  250ms   530
>>  300ms   220
>>  400ms   200
>>  500ms    80
>>  750ms    90
>> 1000ms    50
>> >1000ms   30
>>
>> Now we sum up the bin count into cumulated count and take percentages:
>>
>> Duration binCount Cum. Cum.Pct binPct
>> 10ms    3       3       0.10%   0.10%
>> 15ms    2       5       0.17%   0.07%
>> 20ms    5       10      0.33%   0.17%
>> 25ms    10      20      0.67%   0.33%
>> 30ms    8       28      0.93%   0.27%
>> 40ms    20      48      1.60%   0.67%
>> 50ms    28      76      2.54%   0.93%
>> 75ms    100     176     5.87%   3.34%
>> 100ms   230     406     13.55%  7.68%
>> 150ms   610     1016    33.91%  20.36%
>> 200ms   780     1796    59.95%  26.03%
>> 250ms   530     2326    77.64%  17.69%
>> 300ms   220     2546    84.98%  7.34%
>> 400ms   200     2746    91.66%  6.68%
>> 500ms   80      2826    94.33%  2.67%
>> 750ms   90      2916    97.33%  3.00%
>> 1000ms  50      2966    99.00%  1.67%
>> >1000ms 30      2996    100.00% 1.00%
>>
>> The Cum.Pct column could be used instead of the percentiles. One does
>> not need to keep all sample values around and sort them, but one does
>> also not get equidistant percentiles (10%, 11%, 12%, ...).
>>
>> Making the table more fine grained by using more rows is cheap. I have
>> chosen here a short table to make the example easier to understand. The
>> duration limits for the bins I had chosen were human friendly (integral
>> numbers, often divisible by 10 or 100), but one could also use a
>> mathematically strict series like e.g. 100 logarithmic steps or 10
>> logarithmic steps for each factor 10 in duration time. The bin limits
>> would then increase by 26% from bin to bin, e.g. 100ms, 126ms, 158ms,
>> 200ms, 251ms, 316ms, 398ms, 501ms, 631ms, 794ms, 1000ms.
>>
>> Users who need integral percentiles, like exactly 50% or 99% values
>> instead of 48.2% or 99.05% values (example) would need to either post
>> process the sample logs or choose a more expensive live processing (in
>> terms of memory and cpu be it inside JMeter or another backend). I'd
>> expect, that often the non-integral percentiles above are good enough to
>> follow a running test and get sufficient data about its behavior (and
>> are cheap enough to produce on the fly) and the integral percentiles are
>> OK to generate during post-processing and full report generation after a
>> run.
>>
>> Just my 2c.
>>
>> Regards,
>>
>> Rainer
>>
>
>
> --
> Cordialement.
> Philippe Mouawad.

Re: [Bug 55932] Create a Graphite Listener

Reply via email to