[ 
https://issues.apache.org/jira/browse/CASSANDRA-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17473170#comment-17473170
 ] 

David Capwell commented on CASSANDRA-17116:
-------------------------------------------

I think we finally found the issue!  When the follower sends the last msg, 
there is a race condition as we do the following

{code}
send(lastMessage);
// with bad OS scheduling issues, this may happen AFTER the initiator closes 
the socket
closeSession(COMPLETE);
{code}

if the OS scheduler runs the closeSession(COMPLETE) logic AFTER getting the 
socket close msg from initiator, then the socket will be closed and cause the 
ClosedChannelException, and since the state isn't final we fail the session...

> When zero-copy-streaming sees a channel close this triggers the disk failure 
> policy
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17116
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17116
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Streaming
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 4.x
>
>
> Found in CASSANDRA-17085.
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1069/workflows/26b7b83a-686f-4516-a56a-0709d428d4f2/jobs/7264
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/1069/workflows/26b7b83a-686f-4516-a56a-0709d428d4f2/jobs/7256
> {code}
> ERROR [Stream-Deserializer-/127.0.0.1:7000-f2eb1a15] 2021-11-02 21:35:40,983 
> DefaultFSErrorHandler.java:104 - Exiting forcefully due to file system 
> exception on startup, disk failure policy "stop"
> org.apache.cassandra.io.FSWriteError: java.nio.channels.ClosedChannelException
>       at 
> org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.write(BigTableZeroCopyWriter.java:227)
>       at 
> org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.writeComponent(BigTableZeroCopyWriter.java:206)
>       at 
> org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader.read(CassandraEntireSSTableStreamReader.java:125)
>       at 
> org.apache.cassandra.db.streaming.CassandraIncomingFile.read(CassandraIncomingFile.java:84)
>       at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:51)
>       at 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize(IncomingStreamMessage.java:37)
>       at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:50)
>       at 
> org.apache.cassandra.streaming.StreamDeserializingTask.run(StreamDeserializingTask.java:62)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.channels.ClosedChannelException: null
>       at 
> org.apache.cassandra.net.AsyncStreamingInputPlus.reBuffer(AsyncStreamingInputPlus.java:136)
>       at 
> org.apache.cassandra.net.AsyncStreamingInputPlus.consume(AsyncStreamingInputPlus.java:155)
>       at 
> org.apache.cassandra.io.sstable.format.big.BigTableZeroCopyWriter.write(BigTableZeroCopyWriter.java:217)
>       ... 9 common frames omitted
> {code}
> When bootstrap fails and streaming is closed, this triggers the disk failure 
> policy which causes the JVM to halt by default (if this happens outside of 
> bootstrap, then we stop transports and keep the JVM up).
> org.apache.cassandra.streaming.StreamDeserializingTask attempts to handle 
> this by ignoring this exception, but the call to 
> org.apache.cassandra.streaming.messages.IncomingStreamMessage$1.deserialize
>  Does try/catch and inspects exception; triggering this condition.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to