[
https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408427#comment-15408427
]
Tyler Hobbs commented on CASSANDRA-12311:
-----------------------------------------
The patch is looking good, thanks for your work on this so far.
bq. Since we need to pass some sort of failure code back from the replicas, I
wanted to use the same set of failure codes between nodes as between the client
and coordinator. So I placed the codes in a new enum RequestFailureReason and
placed the map under RequestFailureException, meaning WriteFailureException s
will carry this endpoint to failure code map as well. Please let me know what
you think.
That's perfect, thanks.
bq. I placed the codes in a new enum RequestFailureReason and placed the map
under RequestFailureException, meaning WriteFailureException s will carry this
endpoint to failure code map as well.
Also good.
bq. Also, for what it's worth, while going through the protocol documentation I
noticed that \[byte\] is referenced a few times but never explicitly defined
under the "Notations" section. This could lead to ambiguity when it is used to
define an encoding for an integer (i.e. signed or unsigned). Is this something
we should perhaps consider adding to the specification?
True, it would be good to specify. I believe all uses should be considered
unsigned, unless otherwise specified. The only signed example I can think of
is the {{tinyint}} type, which does specify that it's two's complement.
Regarding the patch, I believe we can remove the {{num_failures}} {{\[int\]}}
from read/write failure messages, because the new map provides that information.
Also, I checked {{MessageIn}} deserialization and handling of parameters, and
it seems like this change would actually be safe to make without bumping the
internode messaging version. Unknown parameters are ignored, which should be
fine in this case. So, I think we can commit this in 3.x as part of the {{v5}}
native protocol.
Last, we should make sure there are tests for this. The dtests already have
some coverage of {{ReadFailure}} and {{WriteFailure}}. See
{{paging_test.py:TestPagingWithDeletions.test_failure_threshold_deletions}},
{{pushed_notifications_test.py:TestVariousNotifications.tombstone_failure_threshold_message_test}},
and {{write_failures_test.py}} for examples and test cases to build on.
> Propagate TombstoneOverwhelmingException to the client
> ------------------------------------------------------
>
> Key: CASSANDRA-12311
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12311
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Geoffrey Yu
> Assignee: Geoffrey Yu
> Priority: Minor
> Labels: client-impacting, doc-impacting
> Fix For: 4.x
>
> Attachments: 12311-trunk-v2.txt, 12311-trunk-v3.txt, 12311-trunk.txt
>
>
> Right now if a data node fails to perform a read because it ran into a
> {{TombstoneOverwhelmingException}}, it only responds back to the coordinator
> node with a generic failure. Under this scheme, the coordinator won't be able
> to know exactly why the request failed and subsequently the client only gets
> a generic {{ReadFailureException}}. It would be useful to inform the client
> that their read failed because we read too many tombstones. We should have
> the data nodes reply with a failure type so the coordinator can pass this
> information to the client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)