[
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462450#comment-13462450
]
Peter Schuller commented on CASSANDRA-4705:
-------------------------------------------
99% based on what time period? If period it too short, you won't get the full
impact since you'll pollute the track record. If it's too large, consider the
traffic increase resulting from a prolonged hiccup. Will you be able to hide
typical GC pauses? Then you better have the window be higher than 250 ms. What
about full gc:s? How do you determine what the p99 is given a node with
multiple replica sets shared with it? If a single node goes into full gc, how
do you make latency be un-affected while still capping the number of backup
requests at a reasonable number? If you don't cap it, the optimization is more
dangerous than useful, since it just means you'll fall over under various
hard-to-predict emergent situations if you expect to take advantage of less
reads when provisioning your cluster. What's an appropriate cap? How do you
scale that with RF and consistency level? How do you explain this to the person
who has to figure out how much capacity is needed for a cluster?
In our case, we pretty much run all our clusters with RR turned fully up - not
necessarily for RR purposes, but for the purpose of more deterministic
behavior. You don't want things falling over when a replica goas down. If you
don't have the iops/CPU to take all replicas having to process all requests for
a replica set, you're at risk of falling over (i.e., you don't scale, because
failures are common in large clusters) - unless you over-provision, but then
you might as well go all data reads to begin with.
I am not arguing against the idea of backup requests, but I *strongly*
recommend simply going for the trivial and obvious route of full data reads
*first* and getting the obvious pay-off with no increase in complexity (I would
even argue it's a *decrease* in complexity in terms of the behavior of the
system as a whole, especially from the perspective of a human understanding
emergent cluster behavior) - and then slowly develop something like this, with
very careful thought to all the edge cases and implications of it.
I'm in favor of long-term *predictable* performance. Full data reads is a very
very easy way to achieve that, and vastly better latency, in many cases (the
bandwidth saturation case pretty much being the major exception; CPU savings
aren't really relevant with Cassandra's model if you expect to survive nodes
being down). It's also very easy for a human to understand the behavior when
looking at graphs of system behavior in some event, and trying to predict what
will happen, or explain what did happen.
I really think the drawbacks of full data reads are being massively
over-estimated and the implications of lack of data reads massively
under-estimated.
> Speculative execution for CL_ONE
> --------------------------------
>
> Key: CASSANDRA-4705
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
> Project: Cassandra
> Issue Type: Improvement
> Affects Versions: 1.2.0
> Reporter: Vijay
> Assignee: Vijay
> Priority: Minor
>
> When read_repair is not 1.0, we send the request to one node for some of the
> requests. When a node goes down or when a node is too busy the client has to
> wait for the timeout before it can retry.
> It would be nice to watch for latency and execute an additional request to a
> different node, if the response is not received within average/99% of the
> response times recorded in the past.
> CASSANDRA-2540 might be able to solve the variance when read_repair is set to
> 1.0
> 1) May be we need to use metrics-core to record various Percentiles
> 2) Modify ReadCallback.get to execute additional request speculatively.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira