[
https://issues.apache.org/jira/browse/HDFS-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006598#comment-14006598
]
Colin Patrick McCabe commented on HDFS-6227:
--------------------------------------------
It seems that whenever you deliver an {{InterruptedException}} while in
{{FileChannel#read}}, the channel is immediately closed. This causes problems
for short-circuit reads, since multiple threads may be (p)reading from a single
pair of file descriptors (for the block and the checksum).
We can certainly check if either channel was closed in
{{BlockReaderLocal#close}}, and mark the replica as stale in that case. That
will limit the harm somewhat. But there isn't any easy way to save concurrent
readers from getting the same {{ClosedChannelException}}. Theoretically we
could wrap every call to {{blockReader#read}} in a retry block that treated
this problem differently from a regular I/O error. But there are a lot of
callers and that retry code is already a little complex.
For the purposes of YARN, I think checking whether the channel is closed in
{{BlockReaderLocal#close}} is enough, since each container will only be running
one thing at a time, as I understand.
> Short circuit read failed due to ClosedChannelException
> -------------------------------------------------------
>
> Key: HDFS-6227
> URL: https://issues.apache.org/jira/browse/HDFS-6227
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Jing Zhao
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-6227.000.patch,
> ShortCircuitReadInterruption.test.patch
>
>
> While running tests in a single node cluster, where short circuit read is
> enabled and multiple threads may read the same file concurrently, one of the
> read got ClosedChannelException and failed. Full exception trace see comment.
--
This message was sent by Atlassian JIRA
(v6.2#6252)