[
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657705#comment-13657705
]
Peter Bailis commented on CASSANDRA-5455:
-----------------------------------------
I've thought some more about different options for enabling metrics that are
useful to both PBS (in an external module, if committers prefer) and anyone
else who would be interested in finer-grained tracing.
To start, I *do* think that there is interest in a PBS module: if an eventually
consistent store is returning stale data, how stale *is* it? Especially given
that many (most?) Cassandra client libraries (including the Datastax
java-driver) choose CL=ONE by default, I'd expect most users would prefer to
understand how their choice of N,R, and W affects their latency and consistency.
I've been contacted by several Cassandra users who are interested in and/or
using this functionality and understand that several developers are interested
in PBS for Riak (notably, Andy Gross highlighted PBS in his 2013 RICON East
keynote as a useful feature Basho would like). We originally chose Cassandra
based on our familiarity with the code base and on early discussions with
Jonathan but we plan to integrate PBS functionality into Riak with the help of
their committers in the near-term future. So I do think there is interest, and,
if you're curious about *use cases* for this functionality, Shivaram and I will
be demoing PBS in Cassandra 1.2 at the upcoming SIGMOD 2013 conference. Our
demo proposal sketches three application vignettes, including the obvious
integration with monitoring tools but also automatically tuning N,R, and W and
and providing consistency and latency SLAs:
http://www.bailis.org/papers/pbs-demo-sigmod2013.pdf
So, on the more technical side, there are two statistics that aren't currently
measured (in trunk) that are required for accurate PBS predictions. First, PBS
requires per-server statistics. Currently, the ColumnFamily RTT read/write
latency metrics are aggregated across all servers. Second, PBS requires a
measure how how long a read/write request takes before it is processed (i.e.,
how long it took from a client sending each read/write request to when it was
performed). This requires knowledge of one-way request latencies as well as
read/write request-specific logic.
The 1.2 PBS patch provided both of these, aggregating by server and measuring
the delay until processing. As Jonathan notes above, the latter measurement was
conservative--the remote replica recorded the time that it enqueued its
response rather than the exact moment a read or write was performed, namely for
simplicity of code. The coordinating server could then closely approximate the
return time as RTT-(remote timestamp).
Given these requirements and the current state of trunk, there are a few ways
forward to support an external PBS prediction module:
1a.) Modify Cassandra to store latency statistics on a per-server and
per-ColumnFamily granularity. As Rick Branson has pointed out, this is actually
useful for monitoring other than PBS and can be used to detect slower replicas.
1b.) Modify Cassandra to store local processing times for requests (i.e.,
expand StorageMetrics, which currently does not track the time required to,
say, fulfill a local read stage). This also has the benefit of understanding
whether a Cassandra node is slow due to network or disk.
2.) Use the newly developed tracing functionality to reconstruct latencies for
selected requests. Performing any sort of profiling will require tracing to be
enabled (this appears to be somewhat heavyweight given the amount of data that
is logged for each request , and reconstructing latencies from the trace table
may be expensive (i.e., amount to a many-way self-join).
3.) Use RTT/2 based on ColumnFamily LatencyMetrics as an inaccurate but already
supported external predictor.
4.) Leave the PBS latency sampling as in 1.2 but remove the PBS predictor code.
Expose the latency samples via an Mbean for users like Rick who would benefit
from it.
Proposal #1 has benefits for many users and seems a natural extension to the
existing metrics but requires changes to the existing code. Proposal #2 puts
substantial burden on an end-user and, without a fixed schema for the trace
table, may amount to a fair bit of code munging. Proposal #3 is inaccurate but
works on trunk. Proposal #4 is essentially 1.2.0 without the requirement to
maintain any PBS-specific code and is a reasonable stop-gap before proposal #1.
All of these proposals are amenable to sampling.
I'd welcome your feedback on these proposals and next steps.
> Remove PBSPredictor
> -------------------
>
> Key: CASSANDRA-5455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 2.0
>
> Attachments: 5455.txt
>
>
> It was a fun experiment, but it's unmaintained and the bar to understanding
> what is going on is high. Case in point: PBSTest has been failing
> intermittently for some time now, possibly even since it was created. Or
> possibly not and it was a regression from a refactoring we did. Who knows?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira