[ 
https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408427#comment-15408427
 ] 

Tyler Hobbs commented on CASSANDRA-12311:
-----------------------------------------

The patch is looking good, thanks for your work on this so far.

bq.  Since we need to pass some sort of failure code back from the replicas, I 
wanted to use the same set of failure codes between nodes as between the client 
and coordinator. So I placed the codes in a new enum RequestFailureReason and 
placed the map under RequestFailureException, meaning WriteFailureException s 
will carry this endpoint to failure code map as well. Please let me know what 
you think.

That's perfect, thanks.

bq. I placed the codes in a new enum RequestFailureReason and placed the map 
under RequestFailureException, meaning WriteFailureException s will carry this 
endpoint to failure code map as well.

Also good.

bq. Also, for what it's worth, while going through the protocol documentation I 
noticed that \[byte\] is referenced a few times but never explicitly defined 
under the "Notations" section. This could lead to ambiguity when it is used to 
define an encoding for an integer (i.e. signed or unsigned). Is this something 
we should perhaps consider adding to the specification? 

True, it would be good to specify.  I believe all uses should be considered 
unsigned, unless otherwise specified.  The only signed example I can think of 
is the {{tinyint}} type, which does specify that it's two's complement.

Regarding the patch, I believe we can remove the {{num_failures}} {{\[int\]}} 
from read/write failure messages, because the new map provides that information.

Also, I checked {{MessageIn}} deserialization and handling of parameters, and 
it seems like this change would actually be safe to make without bumping the 
internode messaging version.  Unknown parameters are ignored, which should be 
fine in this case. So, I think we can commit this in 3.x as part of the {{v5}} 
native protocol.

Last, we should make sure there are tests for this.  The dtests already have 
some coverage of {{ReadFailure}} and {{WriteFailure}}.  See 
{{paging_test.py:TestPagingWithDeletions.test_failure_threshold_deletions}}, 
{{pushed_notifications_test.py:TestVariousNotifications.tombstone_failure_threshold_message_test}},
 and {{write_failures_test.py}} for examples and test cases to build on.

> Propagate TombstoneOverwhelmingException to the client
> ------------------------------------------------------
>
>                 Key: CASSANDRA-12311
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12311
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Geoffrey Yu
>            Assignee: Geoffrey Yu
>            Priority: Minor
>              Labels: client-impacting, doc-impacting
>             Fix For: 4.x
>
>         Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, 12311-trunk.txt
>
>
> Right now if a data node fails to perform a read because it ran into a 
> {{TombstoneOverwhelmingException}}, it only responds back to the coordinator 
> node with a generic failure. Under this scheme, the coordinator won't be able 
> to know exactly why the request failed and subsequently the client only gets 
> a generic {{ReadFailureException}}. It would be useful to inform the client 
> that their read failed because we read too many tombstones. We should have 
> the data nodes reply with a failure type so the coordinator can pass this 
> information to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to