[
https://issues.apache.org/jira/browse/HDFS-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249649#comment-15249649
]
Walter Su commented on HDFS-5280:
---------------------------------
+1 for catching the exception. The same exception will cause {{BlockScanner}}
to shutdown.
We should be cautious to catch any {{RuntimeException}}. Instead of add
{{catch}} to the outside try-finally clause, how about just catch the exactly
exception at the place where it's been threw. Like what we did in
{{FSNamesystem.java}}
{code}
744 try {
745 checksumType = DataChecksum.Type.valueOf(checksumTypeStr);
746 } catch (IllegalArgumentException iae) {
747 throw new IOException("Invalid checksum type in "
748 + DFS_CHECKSUM_TYPE_KEY + ": " + checksumTypeStr);
749 }
{code}
> Corrupted meta files on data nodes prevents DFClient from connecting to data
> nodes and updating corruption status to name node.
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-5280
> URL: https://issues.apache.org/jira/browse/HDFS-5280
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, hdfs-client
> Affects Versions: 1.1.1, 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.7.2
> Environment: Red hat enterprise 6.4
> Hadoop-2.1.0
> Reporter: Jinghui Wang
> Assignee: Andres Perez
> Attachments: HDFS-5280.patch
>
>
> Meta files being corrupted causes the DFSClient not able to connect to the
> datanodes to access the blocks, so DFSClient never perform a read on the
> block, which is what throws the ChecksumException when file blocks are
> corrupted and report to the namenode to mark the block as corrupt. Since the
> client never got to that far, thus the file status remain as healthy and so
> are all the blocks.
> To replicate the error, put a file onto HDFS.
> run hadoop fsck /tmp/bogus.csv -files -blocks -location will get that
> following output.
> FSCK started for path /tmp/bogus.csv at 11:33:29
> /tmp/bogus.csv 109 bytes, 1 block(s): OK
> 0. blk_-4255166695856420554_5292 len=109 repl=3
> find the block/meta files for 4255166695856420554 by running
> ssh datanode1.address find /hadoop/ -name "*4255166695856420554*" and it will
> get the following output:
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta
> now corrupt the meta file by running
> ssh datanode1.address "sed -i -e '1i 1234567891'
> /hadoop/data1/hdfs/current/subdir2/blk_-4255166695856420554_5292.meta"
> now run hadoop fs -cat /tmp/bogus.csv
> will show the stack trace of DFSClient failing to connect to the data node
> with the corrupted meta file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)