[ 
https://issues.apache.org/jira/browse/CASSANDRA-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geoffrey Yu updated CASSANDRA-2848:
-----------------------------------
    Attachment: 2848-trunk-v2.txt

I'm attaching a second version of the patch that incorporates the changes in 
CASSANDRA-12256.

*TL;DR:* The timeout is represented as an {{OptionalLong}} that is encoded in 
{{QueryOptions}}. It is passed all the way to the replica nodes on reads 
through {{ReadCommand}}, but is only kept on the coordinator for writes.


The optional client specified timeout is decoded as a part of {{QueryOptions}}. 
Since this timeout may or may not be specified by a client, I opted to use an 
{{OptionalLong}} in an effort to make it clearer in the code that this is 
optional. I’ve gated the use of the new timeout flag (and encoding the timeout) 
to protocol v5 and above.

On the read path, the timeout is kept within the {{ReadCommand}} and referenced 
in the {{ReadCallback.awaitResults()}}. It is also serialized within the 
{{ReadCommand}} so that replica nodes can use it when setting the monitoring 
time in {{ReadCommandVerbHandler}}. Of course, because the time when the query 
started is not propagated to the replicas, this will only enforce the timeout 
from when the {{MessageIn}} was constructed.

On the write path, the timeout is just passed through the call stack into the 
{{AbstractWriteResponseHandler}}/{{AbstractPaxosCallback}} where it is 
referenced in the respective {{await()}} calls.

I had investigated the possibility of passing the timeout to the replicas on 
the write path. To do so we'd need to incorporate it into the outgoing 
internode message when making a write, meaning placing it into {{Mutation}} or 
otherwise creating some sort of wrapper around a mutation that can hold the 
timeout. It seemed like this would be a very invasive change for minimal gain, 
considering being able to abort an in progress write didn't seem as useful 
compared to aborting an in progress read.

This still requires a version bump in the internode protocol to support the 
change in serialization of {{ReadCommand}} (I haven't touched 
{{MessagingService.current_version}} yet, though). If we don't want to wait 
till 4.0, we can delay this part of the patch and just retain the custom 
timeout on the coordinator (i.e. don't serialize the timeout). Once the branch 
for 4.0 is available, we can modify the serialization to allow us to pass the 
timeout to the replicas.

I'd also like to include some dtests for this, namely to just validate which 
timeout is being used on the coordinator. Is the accepted practice for doing 
something like this to log something and assert for the presence of the log 
entry? I want to avoid relying on the actual timeout observed since that can 
cause the test to be flaky.

> Make the Client API support passing down timeouts
> -------------------------------------------------
>
>                 Key: CASSANDRA-2848
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2848
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Geoffrey Yu
>            Priority: Minor
>             Fix For: 3.x
>
>         Attachments: 2848-trunk-v2.txt, 2848-trunk.txt
>
>
> Having a max server RPC timeout is good for worst case, but many applications 
> that have middleware in front of Cassandra, might have higher timeout 
> requirements. In a fail fast environment, if my application starting at say 
> the front-end, only has 20ms to process a request, and it must connect to X 
> services down the stack, by the time it hits Cassandra, we might only have 
> 10ms. I propose we provide the ability to specify the timeout on each call we 
> do optionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to