[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462515#comment-13462515
 ] 

Vijay commented on CASSANDRA-4705:
----------------------------------

{quote}
99% based on what time period? If period it too short, you won't get the full 
impact since you'll pollute the track record. If it's too large, consider the 
traffic increase resulting from a prolonged hiccup
{quote}
Thats the hardest problem which i am trying to solve right now :) Actually 
surprisingly (to me) the code itself is not complicated to send a backup 
request.

{quote}
Will you be able to hide typical GC pauses?
{quote}
Worst case we send some extra requests which IMO is ok for few milliseconds.

Most times while working on AWS the network is usually not that predictable, 
and with MR clusters we where reluctant to enable RR. 
This is not something new to me, we did something like this back in NFLX (we 
never named it fancy :)) in the client 
(http://netflix.github.com/astyanax/javadoc/com/netflix/astyanax/Execution.html#executeAsync())
 to retry independent of the default rpc_timeout.

{quote}
 am not arguing against the idea of backup requests, but I strongly recommend 
simply going for the trivial and obvious route of full data reads
{quote}
I am neutral about this, originally the idea was to move the above logic which 
was done in the client back in to the server.

{quote}
Here's a good example of complexity implication that I just thought of 
...
{quote}
How about we provide a override for the users with multiple kinds of request? 
we can override via CF setting which will be something like timeout... wait for 
x seconds before sending a secondary request.
                
> Speculative execution for CL_ONE
> --------------------------------
>
>                 Key: CASSANDRA-4705
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>
> When read_repair is not 1.0, we send the request to one node for some of the 
> requests. When a node goes down or when a node is too busy the client has to 
> wait for the timeout before it can retry. 
> It would be nice to watch for latency and execute an additional request to a 
> different node, if the response is not received within average/99% of the 
> response times recorded in the past.
> CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
> 1.0
> 1) May be we need to use metrics-core to record various Percentiles
> 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to