[jira] [Commented] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

Xiao Chen (JIRA) Fri, 23 Oct 2015 10:57:13 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971469#comment-14971469
 ]


Xiao Chen commented on HDFS-9231:
---------------------------------

Thanks a lot for the review [~yzhangal]!
{quote}
1. The description is not quite accurate per our discussion, suggest to modify. 
Especially the patch actually does change (and fix) the behavior when without 
-includeSnapshots.
{quote}
It was great to talk to you. I have updated the description. Modified patch 
summary in the end of this comment.
{quote}
2. A possible optimization in FSDirSnapshotOp#getSnapshotFiles. It seems that 
the sf variable could be calculated in caller for once before the loop in the 
caller, and pass to this method.
{quote}
My apologies for the confusion, I added some comments in this method. But 
getting sf for each snapshottable dir is needed, since /d1 and /d2 have 
different snapshotlist.
{quote}
3. final INodesInPath iip = fsd.getINodesInPath4Write(snap, false); maybe 
substituted with call to getINodesInPath
{quote}
Good catch! I updated the code to call {{getINode}} which invokes 
{{getINodesInPath}}.
{quote}
4. The check if (!corruptFileBlocks.isEmpty()) in 
listCorruptFileBlocksWithSnapshot is not needed
{quote}
Good call. Fixed.
{quote}
5. Add comment in listCorruptFileBlocks() before the call 
namenode.getNamesystem().listCorruptFileBlocksWithSnapshot, to indicate that 
snapshottableDirs is only relevant when -includeSnapshots is specified.
{quote}
Added a link to {{FSNamesystem#listCorruptFileBlocksWithSnapshot}} which 
explains that parameter in javadoc.
{quote}
6. In listCorruptFileBlocksWithSnapshot, we can add
{code}
if (snapshottableDirs == null) {
  continue;
}
{code}
to avoid the call to getSnapshotFiles.
{quote}
I'm not sure this is necessary. On one hand, it definitely saves 1 call stack. 
On the other hand, with the existence of all those loops and checks, I think 
the performance gain of saving 1 call stack would be trivial.  And the nullity 
check of snapshottableDirs is already performed as a first step in 
{{getSnapshotFiles}}.

Attached patch 005 with the above modifications. Updated summary below:
- {{fsck -list-corruptfileblocks -includeSnapshots}} will also show full dir of 
snapshots
- {{fsck -list-corruptfileblocks}} without -includeSnapshots will not show 
corrupt blocks that only have snapshot files
- NameNode WebUI's way of showing corrupted files/blocks unchanged.
- Added a sentence in NN WebUI to hint the admin to run fsck with 
-includeSnapshots, if there're snapshots present in the system.
- Some refactoring to reuse existing code in new methods getSnapshottableDirs 
and ListCorruptFileBlocksWithSnapshot
- The reasoning of keep minimal change to NN WebUI and fsck without 
-includeSnapshots is that getting all possible snapshots may be slow.

> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> -----------------------------------------------------------------------
>
>                 Key: HDFS-9231
>                 URL: https://issues.apache.org/jira/browse/HDFS-9231
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch
>
>
> Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt 
> blocks with the original file dir instead of the snapshot dir, and {{fsck 
> -list-corruptfileblocks -includeSnapshots}} behave the same.
> This can be confusing because even when the original file is deleted, fsck 
> will still show that deleted file as corrupted, although what's actually 
> corrupted is the snapshot. 
> As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

Reply via email to