[
https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279768#comment-14279768
]
Jonathan Shook commented on CASSANDRA-8621:
-------------------------------------------
For the scenario that prompted this ticket, it appeared that the streaming
process was completely stalled. One side of the stream (the sender side) had an
exception that appeared to be a connection reset. The receiving side appeared
to think that the connection was still active, at least in terms of the
netstats reported by nodetool. We were unable to verify whether this was
specifically the case in terms of connected sockets due to the fact that there
were multiple streams for those peers, and there is no simple way to correlate
a specific stream to a tcp session.
[~yukim]
If there is a diagnostic method that we can use to provide more information
about specific stalled streams, please let us know so that we can approach the
user to get more data.
> For streaming operations, when a socket is closed/reset, we should
> retry/reinitiate that stream
> -----------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-8621
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jeremy Hanna
> Assignee: Yuki Morishita
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will
> timeout and retry the stream operation in the case where tcp is idle for a
> period of time. However in the case where the socket is closed or reset, we
> do not retry the operation. This can happen for a number of reasons,
> including when a firewall sends a reset message on a socket during a
> streaming operation, such as nodetool rebuild necessarily across DCs or
> repairs.
> Doing a retry would make the streaming operations more resilient. It would
> be good to log the retry clearly as well (with the stream session ID and node
> address).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)