Benedict created CASSANDRA-6486:
-----------------------------------

             Summary: Latency Measurement
                 Key: CASSANDRA-6486
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6486
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Benedict
            Assignee: Benedict


Latency measurement in Cassandra is currently suboptimal. Exactly what the 
latency measurements tell you isn't intuitively clear due to their 
exponentially decaying, but amount to some view of the latency per (unweighted) 
operation over the past, approximately, 10 minute period, with greater weight 
given to more recent operations. This has some obvious flaws, the most notable 
being that due to probabilistic sampling, large outlier events (e.g. GC) can 
easily be lost over a multi-minute time horizon, and even when caught are 
unlikely to appear even in the 99.9th percentile due to accounting for a tiny 
fraction of events numerically.

I'm generally thinking about how we might improve on this, and want to dump my 
ideas here for discussion. I think the following things should be targeted:

1) Ability to see uniform latency measurements for different time horizons 
stretching back from the present, e.g. last 1s, 1m, 1hr and 1day
2) Ability to bound the error margin of statistics for all of these intervals
3) Protect against losing outlier measurements
4) Possibly offer the ability to weight statistics, so that longer latencies 
are not underplayed even if they are counted
5) Preferably non-blocking, memory efficient, and relatively garbage-free

(3) and (4) are the trickiest, as a theoretically sound and general approach 
isn't immediately obvious. There are a number of possibilities that spring to 
mind:
1) ensure that we have enough sample points that we are probabilistically 
guaranteed to not lose them, but over large time horizons this is problematic 
due to memory constraints, and it doesn't address (4);
2) count large events multiple times (or sub-slices of the events), based on 
e.g. average op-rate. I am not a fan of this idea because it makes possibly bad 
assumptions about behaviour and doesn't seem very theoretically sound;
3) weight the probability of retaining an event by its length. the problem with 
this approach is that it ties you into (4) without offering the current view of 
statistics (i.e. unweighted operations), and it also doesn't lend itself to 
efficient implementation

I'm currently leaning towards a fourth approach, which attempts to hybridise 
uniform sampling and histogram behaviour, by separating the sample space into 
ranges, each some multiple of the last (say 2x the size). Each range has a 
uniform sample of events that occured in that range, plus a count of total 
events. Ideally the size of the sample will be variable based on the number of 
events occurring in any range, but that there will be a lower-bound calculated 
to ensure we do not lose events.

This approach lends itself to all 5 goals above:
1) by maintaining the same structure for each time horizon, and uniformly 
sampling from all of the directly lower order time horizons to maintain it;
2) by imposing minimum sample sizes for each range;
3) ditto (2);
4) by producing time/frequency-weighted statistics using the samples and counts 
from each range;
5) with thread-local array-based timers that are synchronised with the global 
timer once every minimum period, by the owning thread

This also extends reasonably nicely the timers I have already written for 
CASSANDRA-6199, so some of the work is already done.

Thoughts / discussion would be welcome, especially if you think I've missed 
another obvious approach.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to