[
https://issues.apache.org/jira/browse/CASSANDRA-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551232#comment-16551232
]
Dinesh Joshi commented on CASSANDRA-14575:
------------------------------------------
Hi [~jasobrown], I think catching {{UnknownTableException}} in the Networking
code and handling it as one off issue should be avoided. We should ideally
catch any exception during Decoding and move on to the next message in the
pipeline. In doing so, we should keep a counter of the number of contiguous
corrupt messages we've seen. If we hit a certain threshold, we close the
channel and move on. This is to protect us from situations where there is a bit
flip in the fields leading to corruption of the whole data stream (our
internode messages are not currently checksummed so we're at risk of running
into this issue). This is also generic enough so any application level
exceptions do not affect the decoding pipeline. We should also try validating
the incoming states as this is a state machine. This is more to protect us from
future changes to the decoding method that may lead to invalid state
transitions. WDYT?
> Reevaluate when to drop an internode connection on message error
> ----------------------------------------------------------------
>
> Key: CASSANDRA-14575
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14575
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jason Brown
> Assignee: Jason Brown
> Priority: Minor
> Fix For: 4.0
>
>
> As mentioned in CASSANDRA-14574, explore if and when we can safely ignore an
> incoming internode message on certain classes of failure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]