[ 
https://issues.apache.org/jira/browse/CASSANDRA-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464183#comment-17464183
 ] 

Francisco Guerrero commented on CASSANDRA-17116:
------------------------------------------------

[~djoshi] and I took another look at this, it seems that the race is happening 
in {{org.apache.cassandra.streaming.StreamSession#maybeCompleted()}}. 

The race is most likely happening in the code block below.

{code:java}
    channel.sendControlMessage(new CompleteMessage());
    closeSession(State.COMPLETE);
{code}

The {{channel.sendControlMessage}} call returns a future and we immediately 
close the session without waiting for the future to execute. In the majority of 
cases, the message will be delivered on time,
Network delays/system load/thread scheduling can cause the {{CompleteMessage}} 
to be sent/received after the session has been closed triggering the 
{{java.nio.channels.ClosedChannelException}}.

A potential solution is to add a listener for the future, and only then close 
the session.

{code:java}
    Future<?> messageFuture = channel.sendControlMessage(new CompleteMessage());
    messageFuture.addListener(f -> closeSession(State.COMPLETE));
{code}

> When zero-copy-streaming sees a channel close this triggers the disk failure 
> policy
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17116
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17116
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Streaming
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 4.x
>
>
> Found in CASSANDRA-17085.
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1069/workflows/26b7b83a-686f-4516-a56a-0709d428d4f2/jobs/7264
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1069/workflows/26b7b83a-686f-4516-a56a-0709d428d4f2/jobs/7256
> {code}
> ERROR [Stream-Deserializer-/127.0.0.1:7000-f2eb1a15] 2021-11-02 21:35:40,983 
> DefaultFSErrorHandler.java:104 - Exiting forcefully due to file system 
> exception on startup, disk failure policy "stop"
> org.apache.cassandra.io.FSWriteError: java.nio.channels.ClosedChannelException
>       at 
> org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.write(BigTableZeroCopyWriter.java:227)
>       at 
> org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.writeComponent(BigTableZeroCopyWriter.java:206)
>       at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:125)
>       at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84)
>       at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:51)
>       at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:37)
>       at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>       at 
> org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:62)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.channels.ClosedChannelException: null
>       at 
> org.apache.cassandra.net.AsyncStreamingInputPlus.reBuffer(AsyncStreamingInputPlus.java:136)
>       at 
> org.apache.cassandra.net.AsyncStreamingInputPlus.consume(AsyncStreamingInputPlus.java:155)
>       at 
> org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.write(BigTableZeroCopyWriter.java:217)
>       ... 9 common frames omitted
> {code}
> When bootstrap fails and streaming is closed, this triggers the disk failure 
> policy which causes the JVM to halt by default (if this happens outside of 
> bootstrap, then we stop transports and keep the JVM up).
> org.apache.cassandra.streaming.StreamDeserializingTask attempts to handle 
> this by ignoring this exception, but the call to 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize
>  Does try/catch and inspects exception; triggering this condition.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to