[
https://issues.apache.org/jira/browse/CASSANDRA-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063456#comment-17063456
]
Kevin Gallardo commented on CASSANDRA-14520:
--------------------------------------------
This exception is caused when a read of a local SSTable gets interrupted by
another thread, or just interrupted.
When this interruption happens, the error is caught by the {{ChannelProxy}},
and thrown as a {{FsReadError}}.
My understanding then is that the {{FsReadError}} is not caught, but handled at
a higher level in the Cassandra daemon by the {{DefaultFsErrorHandler}}.
The {{DefaultFsErrorHandler}} has multiple ways to treating a FsError, decided
by how the {{disk_failure_policy}} is configured, if configured to {{stop}}, it
will indeed stop the daemon/node. It seems that by default the
{{disk_failure_policy}} is {{ignore}} though, which would mean it doesn't stop
the node.
I am not 100% sure that since the CASSANDRA-15066 changes, long reads cannot be
interrupted by another thread, i.e. it seems this exception could still be
thrown at any time.
In any case, it seem the case of a long read of a local SSTable being
interrupted shouldn't trigger the node to shutdown, ever? In which case I would
suggest to catch the {{ClosedByInterruptException}} in {{ChannelProxy}} and
rethrow it as a runtime exception, which should be caught in the
{{NettyStreamingMessageSender.FileStreamTask#run}} and have a more graceful
exception handling.
Does that make sense?
[~bdeggleston] [~benedict]
> ClosedChannelException handled as FSError
> -----------------------------------------
>
> Key: CASSANDRA-14520
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14520
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Streaming and Messaging
> Reporter: Blake Eggleston
> Assignee: Kevin Gallardo
> Priority: Urgent
> Fix For: 4.0
>
>
> After the messaging service netty refactor, I’ve seen a few instances where a
> closed socket causes a ClosedChannelException (an IOException subclass) to be
> thrown. The exception is caught by ChannelProxy, interpreted as a disk error,
> and is then re-thrown as an FSError, causing the node to be shutdown.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]