[ https://issues.apache.org/jira/browse/CASSANDRA-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184977#comment-15184977 ]
Paulo Motta commented on CASSANDRA-11286: ----------------------------------------- I managed to reproduce the dtest failure, which is different from the failure on CASSANDRA-10912. The problem is that sometimes, the stream process is killed before any range is transferred, so it's not possible to find the "already available. Skipping streaming." message in the logs. I submitted a [DTEST PR|https://github.com/riptano/cassandra-dtest/pull/841] that removes the non-deterministic check for "skipping streaming" message and instead added a stress check that the inserted data is present after the resumed bootstrap is completed. > streaming socket never times out > -------------------------------- > > Key: CASSANDRA-11286 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11286 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Reporter: Paulo Motta > Assignee: Paulo Motta > > While trying to reproduce CASSANDRA-8343 I was not able to trigger a > {{SocketTimeoutException}} by adding an artificial sleep longer than > {{streaming_socket_timeout_in_ms}}. > After investigation, I detected two problems: > * {{ReadableByteChannel}} creation via {{socket.getChannel()}}, as done in > {{ConnectionHandler.getReadChannel(socket)}}, does not respect > {{socket.setSoTimeout()}}, as explained in this [blog > post|https://technfun.wordpress.com/2009/01/29/networking-in-java-non-blocking-nio-blocking-nio-and-io/] > ** bq. The only difference between “blocking NIO” and “NIO wrapped around IO” > is that you can’t use socket timeout with SocketChannels. Why ? Read a > javadoc for setSocketTimeout(). It says that this timeout is used only by > streams. > * {{socketSoTimeout}} is never set on "follower" side, only on initiator side > via {{DefaultConnectionFactory.createConnection(peer)}}. > This may cause streaming to hang indefinitely, as exemplified by > CASSANDRA-8621: > bq. For the scenario that prompted this ticket, it appeared that the > streaming process was completely stalled. One side of the stream (the sender > side) had an exception that appeared to be a connection reset. The receiving > side appeared to think that the connection was still active, at least in > terms of the netstats reported by nodetool. We were unable to verify whether > this was specifically the case in terms of connected sockets due to the fact > that there were multiple streams for those peers, and there is no simple way > to correlate a specific stream to a tcp session. -- This message was sent by Atlassian JIRA (v6.3.4#6332)