[ 
https://issues.apache.org/jira/browse/CASSANDRA-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280561#comment-15280561
 ] 

Paulo Motta commented on CASSANDRA-8343:
----------------------------------------

This also happens when streaming large files (see CASSANDRA-11345), so a 
Keep-Alive message from the sender will probably be necessary during the whole 
streaming process (not only after {{WAIT_COMPLETE}} state.

While the streaming protocol version does not change, we could probably reuse 
the "received" message, with a special file seq num (-1) to represent a keep 
alive message.

> Secondary index creation causes moves/bootstraps to fail
> --------------------------------------------------------
>
>                 Key: CASSANDRA-8343
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8343
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Michael Frisch
>            Assignee: Paulo Motta
>
> Node moves/bootstraps are failing if the stream timeout is set to a value in 
> which secondary index creation cannot complete.  This happens because at the 
> end of the very last stream the StreamInSession.closeIfFinished() function 
> calls maybeBuildSecondaryIndexes on every column family.  If the stream time 
> + all CF's index creation takes longer than your stream timeout then the 
> socket closes from the sender's side, the receiver of the stream tries to 
> write to said socket because it's not null, an IOException is thrown but not 
> caught in closeIfFinished(), the exception is caught somewhere and not 
> logged, AbstractStreamSession.close() is never called, and the CountDownLatch 
> is never decremented.  This causes the move/bootstrap to continue forever 
> until the node is restarted.
> This problem of stream time + secondary index creation time exists on 
> decommissioning/unbootstrap as well but since it's on the sending side the 
> timeout triggers the onFailure() callback which does decrement the 
> CountDownLatch leading to completion.
> A cursory glance at the 2.0 code leads me to believe this problem would exist 
> there as well.
> Temporary workaround: set a really high/infinite stream timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to