[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657705#comment-13657705
 ] 

Peter Bailis commented on CASSANDRA-5455:
-----------------------------------------

I've thought some more about different options for enabling metrics that are 
useful to both PBS (in an external module, if committers prefer) and anyone 
else who would be interested in finer-grained tracing.

To start, I *do* think that there is interest in a PBS module: if an eventually 
consistent store is returning stale data, how stale *is* it? Especially given 
that many (most?) Cassandra client libraries (including the Datastax 
java-driver) choose CL=ONE by default, I'd expect most users would prefer to 
understand how their choice of N,R, and W affects their latency and consistency.

I've been contacted by several Cassandra users who are interested in and/or 
using this functionality and understand that several developers are interested 
in PBS for Riak (notably, Andy Gross highlighted PBS in his 2013 RICON East 
keynote as a useful feature Basho would like). We originally chose Cassandra 
based on our familiarity with the code base and on early discussions with 
Jonathan but we plan to integrate PBS functionality into Riak with the help of 
their committers in the near-term future. So I do think there is interest, and, 
if you're curious about *use cases* for this functionality, Shivaram and I will 
be demoing PBS in Cassandra 1.2 at the upcoming SIGMOD 2013 conference. Our 
demo proposal sketches three application vignettes, including the obvious 
integration with monitoring tools but also automatically tuning N,R, and W and 
and providing consistency and latency SLAs:
http://www.bailis.org/papers/pbs-demo-sigmod2013.pdf

So, on the more technical side, there are two statistics that aren't currently 
measured (in trunk) that are required for accurate PBS predictions. First, PBS 
requires per-server statistics. Currently, the ColumnFamily RTT read/write 
latency metrics are aggregated across all servers. Second, PBS requires a 
measure how how long a read/write request takes before it is processed (i.e., 
how long it took from a client sending  each read/write request to when it was 
performed). This requires knowledge of one-way request latencies as well as 
read/write request-specific logic.

The 1.2 PBS patch provided both of these, aggregating by server and measuring 
the delay until processing. As Jonathan notes above, the latter measurement was 
conservative--the remote replica recorded the time that it enqueued its 
response rather than the exact moment a read or write was performed, namely for 
simplicity of code. The coordinating server could then closely approximate the 
return time as RTT-(remote timestamp).

Given these requirements and the current state of trunk, there are a few ways 
forward to support an external PBS prediction module:

1a.) Modify Cassandra to store latency statistics on a per-server and 
per-ColumnFamily granularity. As Rick Branson has pointed out, this is actually 
useful for monitoring other than PBS and can be used to detect slower replicas.

1b.) Modify Cassandra to store local processing times for requests (i.e., 
expand StorageMetrics, which currently does not track the time required to, 
say, fulfill a local read stage). This also has the benefit of understanding 
whether a Cassandra node is slow due to network or disk.

2.) Use the newly developed tracing functionality to reconstruct latencies for 
selected requests. Performing any sort of profiling will require tracing to be 
enabled (this appears to be somewhat heavyweight given the amount of data that 
is logged for each request , and reconstructing latencies from the trace table 
may be expensive (i.e., amount to a many-way self-join).

3.) Use RTT/2 based on ColumnFamily LatencyMetrics as an inaccurate but already 
supported external predictor.

4.) Leave the PBS latency sampling as in 1.2 but remove the PBS predictor code. 
Expose the latency samples via an Mbean for users like Rick who would benefit 
from it.

Proposal #1 has benefits for many users and seems a natural extension to the 
existing metrics but requires changes to the existing code. Proposal #2 puts 
substantial burden on an end-user and, without a fixed schema for the trace 
table, may amount to a fair bit of code munging. Proposal #3 is inaccurate but 
works on trunk. Proposal #4 is essentially 1.2.0 without the requirement to 
maintain any PBS-specific code and is a reasonable stop-gap before proposal #1. 
All of these proposals are amenable to sampling.

I'd welcome your feedback on these proposals and next steps.
                
> Remove PBSPredictor
> -------------------
>
>                 Key: CASSANDRA-5455
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 2.0
>
>         Attachments: 5455.txt
>
>
> It was a fun experiment, but it's unmaintained and the bar to understanding 
> what is going on is high.  Case in point: PBSTest has been failing 
> intermittently for some time now, possibly even since it was created.  Or 
> possibly not and it was a regression from a refactoring we did.  Who knows?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to