[ 
https://issues.apache.org/jira/browse/CASSANDRA-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184977#comment-15184977
 ] 

Paulo Motta commented on CASSANDRA-11286:
-----------------------------------------

I managed to reproduce the dtest failure, which is different from the failure 
on CASSANDRA-10912. The problem is that sometimes, the stream process is killed 
before any range is transferred, so it's not possible to find the "already 
available. Skipping streaming." message in the logs.

I submitted a [DTEST PR|https://github.com/riptano/cassandra-dtest/pull/841] 
that removes the non-deterministic check for "skipping streaming" message and 
instead added a stress check that the inserted data is present after the 
resumed bootstrap is completed.

> streaming socket never times out
> --------------------------------
>
>                 Key: CASSANDRA-11286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>
> While trying to reproduce CASSANDRA-8343 I was not able to trigger a 
> {{SocketTimeoutException}} by adding an artificial sleep longer than 
> {{streaming_socket_timeout_in_ms}}.
> After investigation, I detected two problems:
> * {{ReadableByteChannel}} creation via {{socket.getChannel()}}, as done in 
> {{ConnectionHandler.getReadChannel(socket)}}, does not respect 
> {{socket.setSoTimeout()}}, as explained in this [blog 
> post|https://technfun.wordpress.com/2009/01/29/networking-in-java-non-blocking-nio-blocking-nio-and-io/]
> ** bq. The only difference between “blocking NIO” and “NIO wrapped around IO” 
> is that you can’t use socket timeout with SocketChannels. Why ? Read a 
> javadoc for setSocketTimeout(). It says that this timeout is used only by 
> streams.
> * {{socketSoTimeout}} is never set on "follower" side, only on initiator side 
> via {{DefaultConnectionFactory.createConnection(peer)}}.
> This may cause streaming to hang indefinitely, as exemplified by 
> CASSANDRA-8621:
> bq. For the scenario that prompted this ticket, it appeared that the 
> streaming process was completely stalled. One side of the stream (the sender 
> side) had an exception that appeared to be a connection reset. The receiving 
> side appeared to think that the connection was still active, at least in 
> terms of the netstats reported by nodetool. We were unable to verify whether 
> this was specifically the case in terms of connected sockets due to the fact 
> that there were multiple streams for those peers, and there is no simple way 
> to correlate a specific stream to a tcp session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to