[
https://issues.apache.org/jira/browse/HDFS-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HDFS-8602:
----------------------------
Attachment: HDFS-8602.000.patch
Thanks very much for reporting the issue and working on this, [~kaisasak]!
I also did some debugging on the issue. Looks like the cause is a deadlock:
after hitting the exception while reading the corrupted block, {{readToBuffer}}
tries to print out some warning msg during which {{getCurrentBlock}} is called.
{{getCurrentBlock}} needs to acquire the inputstream's lock, which is currently
held by the main thread, and the main thread is waiting for the response from
the reading threads.
The patch includes a simple fix and also a unit test that can reproduce the
issue ({{testReadCorruptedData2}}).
> Erasure Coding: Client can't read(decode) the EC files which have corrupt
> blocks.
> ---------------------------------------------------------------------------------
>
> Key: HDFS-8602
> URL: https://issues.apache.org/jira/browse/HDFS-8602
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Takanobu Asanuma
> Assignee: Kai Sasaki
> Fix For: HDFS-7285
>
> Attachments: HDFS-8602.000.patch
>
>
> Before the DataNode(s) reporting bad block(s), when Client reads the EC file
> which has bad blocks, Client gets hung up. And there are no error messages.
> (When Client reads the replicated file which has bad blocks, the bad blocks
> are reconstructed at the same time, and Client can reads it.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)