[jira] [Comment Edited] (CASSANDRA-15352) Replica failure propagation to coordinator and client

Benedict Elliott Smith (Jira) Sat, 22 Feb 2020 03:12:24 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042515#comment-17042515
 ]


Benedict Elliott Smith edited comment on CASSANDRA-15352 at 2/22/20 11:11 AM:
------------------------------------------------------------------------------

When the query is already known to have failed, but the client simply waits 
until timeout to receive notification (of a timeout, not failure); and in 
particular, those where a replica has failed to serve its part of the work but 
does not report it back to the coordinator, I think?

Though admittedly I'm unclear of its interaction with cheap quorums (which 
cannot reasonably be informed by replica information, since the cheap quorum is 
only needed if the replica doesn't respond, though there may be a subset of 
cases where the replica is unable to perform the work but is able to respond)


was (Author: benedict):
When the query is already known to have failed, but the client simply waits 
until timeout to receive notification (of a timeout, not failure)

> Replica failure propagation to coordinator and client
> -----------------------------------------------------
>
>                 Key: CASSANDRA-15352
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15352
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Messaging/Internode
>            Reporter: Alex Petrov
>            Priority: Normal
>
> We should add early reporting of replica-side errors, since currently we just 
> time-out requests. On normal read-write path this is not that important, but 
> this is a protocol change we will need to improve cheap quorums for transient 
> replication. This might have potential positive impact for regular read-write 
> path, since we’ll be aborting queries early instead of timing them out. Can 
> be useful for failing / going away nodes (which is also one of the changes 
> we’re planning to implement). 
> We do have means for propagating error both in client protocol through 
> <reasonmap> and in internode through FAILURE_RSP, which is true and we do not 
> have to extend the protocol to implement this change, but this is still a 
> change in protocol behavior, since we’ll be sending a message where we would 
> usually silently timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-15352) Replica failure propagation to coordinator and client

Reply via email to