[jira] [Updated] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

Wellington Chevreuil (JIRA) Sat, 25 Nov 2017 09:21:41 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wellington Chevreuil updated HDFS-12618:
----------------------------------------
    Attachment: HDFS-12618.004.patch

Here goes another patch attempt. I believe to have found a solution for all the 
cases. Some explanations below:

1) For each file in *.snapshot* folder, it first checks if the path resolves to 
an instance of *INodeFile*. This would be the case for non-renamed files. 
1.1) In this case, we need to check if the given file only exists on snapshots, 
that's possible by calling *inodeFile.isWithSnapshot()*.
1.2) If the file only exists on snapshots, we should then check if it has been 
deleted from original folder, appended or truncated. 
1.3) Files appended or truncated will still have a valid inode outside of 
snapshot folder, as long as original file has not been deleted yet. To check 
this condition, we can call 
*dir.getINodesInPath(inodeFile.getName(),FSDirectory.DirOp.READ).validate();*.
For appended/truncated cases we then need to compare blocks for file in 
snaphsot folder with those to original file, counting only blocks from files in 
snapshot there are not in the original file (outside snapshot).
1.4) If file has been deleted from original folder, it exists only within 
snapshots. Call for 
*dir.getINodesInPath(inodeFile.getName(),FSDirectory.DirOp.READ).validate();* 
will throw an AssertionError in such cases, so in the catch statement we can 
then verify two additional conditions:
1.4.1) If we checking last snapshot, we can simply count all the blocks for the 
file. 
1.4.2) If this is not last snapshot, we need to compare blocks on this file 
with the ones on the last snapshot, and count only those blocks that are not on 
last snapshot.
2) Renamed files will be resolved as either *INodeReference.DstReference* or 
*INodeReference.WithName)*.
2.1) *INodeReference.DstReference* will be the case where file has been renamed 
on original folder, then got renamed and snapshoted again. In this case, we 
only have to count the block if the original file gets deleted. In such 
scenario, *referenceIip.getLastINode()* returns null, so we can count the 
blocks.
2.2) Files in snapshot that then got renamed on the original folder will be 
*INodeReference.WithName*. If the original file gets deleted outside of the 
snapshot, it then needs to be counted. This can be identified by following 
condition: *referenceIip.getLastINode() == null && inode.asFile().getParent() 
== null*.

Current patch is implementing the conditions described above, along with 
additional 12 unit tests for different variations of possible scenarios.

> fsck -includeSnapshots reports wrong amount of total blocks
> -----------------------------------------------------------
>
>                 Key: HDFS-12618
>                 URL: https://issues.apache.org/jira/browse/HDFS-12618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch, HDFS-12618.004.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup          0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:        1
>  Number of racks:             1
>  Total dirs:                  6
>  Total symlinks:              0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):    12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:      0 (0.0 %)
>  Under-replicated blocks:     0 (0.0 %)
>  Mis-replicated blocks:               0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:              0
>  Corrupt blocks:              0
>  Missing replicas:            0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

Reply via email to