[
https://issues.apache.org/jira/browse/CASSANDRA-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405018#comment-15405018
]
Tyler Hobbs edited comment on CASSANDRA-12311 at 8/2/16 11:34 PM:
------------------------------------------------------------------
I think the idea of generalizing this to support error codes is good. However,
I think we should sort a few things out ahead of time.
Ideally, we will have many more error codes than just the one for tombstone
overwhelming. If these are described as part of the native protocol spec, then
new error codes should only really be introduced in new native protocol
versions. However, that seems like it might be overly restrictive. I wonder
if perhaps the native protocol spec should just say "this is a one-byte error
code; for the meaning of the error code, look at <link to cassandra docs
page>". Of course, this means that drivers probably would _not_ handle error
codes in a fancy way (such as tying error messages to particular codes), which
is a downside. One upside to error codes is that they are easily googleable,
though, so users could presumably figure out the meaning quickly.
Second, we may want to combine this improvement with another one that I've been
thinking of. Instead of having a single byte error code, we should return a
map of endpoints to failure codes. Besides handling multiple types of failures
correctly, this would let users know which replica nodes actually had problems,
which is something the current errors don't do.
Third, I think we should go with a two-byte error code. It's used rarely, so
the space doesn't matter, and a single byte may become restrictive over time.
-Last, I haven't had time to verify this, but it seems like the messaging
service changes may have to wait until 4.0? I'm not sure if new parameters are
handled gracefully by nodes that don't know them yet.- *EDIT* yeah, it's
already marked for 4.x.
was (Author: thobbs):
I think the idea of generalizing this to support error codes is good. However,
I think we should sort a few things out ahead of time.
Ideally, we will have many more error codes than just the one for tombstone
overwhelming. If these are described as part of the native protocol spec, then
new error codes should only really be introduced in new native protocol
versions. However, that seems like it might be overly restrictive. I wonder
if perhaps the native protocol spec should just say "this is a one-byte error
code; for the meaning of the error code, look at <link to cassandra docs
page>". Of course, this means that drivers probably would _not_ handle error
codes in a fancy way (such as tying error messages to particular codes), which
is a downside. One upside to error codes is that they are easily googleable,
though, so users could presumably figure out the meaning quickly.
Second, we may want to combine this improvement with another one that I've been
thinking of. Instead of having a single byte error code, we should return a
map of endpoints to failure codes. Besides handling multiple types of failures
correctly, this would let users know which replica nodes actually had problems,
which is something the current errors don't do.
Third, I think we should go with a two-byte error code. It's used rarely, so
the space doesn't matter, and a single byte may become restrictive over time.
Last, I haven't had time to verify this, but it seems like the messaging
service changes may have to wait until 4.0? I'm not sure if new parameters are
handled gracefully by nodes that don't know them yet.
> Propagate TombstoneOverwhelmingException to the client
> ------------------------------------------------------
>
> Key: CASSANDRA-12311
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12311
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Geoffrey Yu
> Assignee: Geoffrey Yu
> Priority: Minor
> Fix For: 4.x
>
> Attachments: 12311-trunk-v2.txt, 12311-trunk.txt
>
>
> Right now if a data node fails to perform a read because it ran into a
> {{TombstoneOverwhelmingException}}, it only responds back to the coordinator
> node with a generic failure. Under this scheme, the coordinator won't be able
> to know exactly why the request failed and subsequently the client only gets
> a generic {{ReadFailureException}}. It would be useful to inform the client
> that their read failed because we read too many tombstones. We should have
> the data nodes reply with a failure type so the coordinator can pass this
> information to the client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)