[ 
https://issues.apache.org/jira/browse/CASSANDRA-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063456#comment-17063456
 ] 

Kevin Gallardo commented on CASSANDRA-14520:
--------------------------------------------

This exception is caused when a read of a local SSTable gets interrupted by 
another thread, or just interrupted. 
When this interruption happens, the error is caught by the {{ChannelProxy}}, 
and thrown as a {{FsReadError}}. 

My understanding then is that the {{FsReadError}} is not caught, but handled at 
a higher level in the Cassandra daemon by the {{DefaultFsErrorHandler}}. 

The {{DefaultFsErrorHandler}} has multiple ways to treating a FsError, decided 
by how the {{disk_failure_policy}} is configured, if configured to {{stop}}, it 
will indeed stop the daemon/node. It seems that by default the 
{{disk_failure_policy}} is {{ignore}} though, which would mean it doesn't stop 
the node.

I am not 100% sure that since the CASSANDRA-15066 changes, long reads cannot be 
interrupted by another thread, i.e. it seems this exception could still be 
thrown at any time.

In any case, it seem the case of a long read of a local SSTable being 
interrupted shouldn't trigger the node to shutdown, ever? In which case I would 
suggest to catch the {{ClosedByInterruptException}} in {{ChannelProxy}} and 
rethrow it as a runtime exception, which should be caught in the 
{{NettyStreamingMessageSender.FileStreamTask#run}} and have a more graceful 
exception handling.

Does that make sense?

[~bdeggleston] [~benedict]

> ClosedChannelException handled as FSError
> -----------------------------------------
>
>                 Key: CASSANDRA-14520
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14520
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Streaming and Messaging
>            Reporter: Blake Eggleston
>            Assignee: Kevin Gallardo
>            Priority: Urgent
>             Fix For: 4.0
>
>
> After the messaging service netty refactor, I’ve seen a few instances where a 
> closed socket causes a ClosedChannelException (an IOException subclass) to be 
> thrown. The exception is caught by ChannelProxy, interpreted as a disk error, 
> and is then re-thrown as an FSError, causing the node to be shutdown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to