[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

Wellington Chevreuil (JIRA) Tue, 14 Nov 2017 09:10:24 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251747#comment-16251747
 ]


Wellington Chevreuil commented on HDFS-12618:
---------------------------------------------

I'm having problems with append and truncate. In both cases, 
*iip.getLastINode()* returns an instance of *INodeFile*. For append, the 
original block for the file is always kept. Thus, let's say I had a file 
*file1* originally with 1 block only, and took a snapshot *snap1* for this 
folder. Then had performed append, so that *file1* has now 2 blocks. This 
*file1* will have a total of 2 blocks now, block for *snap1* file would be 1st 
block of the real file, so snap1 file should not have it's block accounted. I 
thought this would be feasible by performing the following check:

{noformat}
if (inodeFile.isWithSnapshot() && 
inodeFile.getFileWithSnapshotFeature().getDiffs().getLastSnapshotId() == 
iip.getPathSnapshotId() && 
inodeFile.getFileWithSnapshotFeature().isCurrentFileDeleted()) {
            replRes.totalBlocks += inodeFile.getBlocks().length;
          }
{noformat} 

I guess this would work, as we just have to count blocks for files inside 
snapshots if the original file has been removed, and the file is on the last 
snapshot. Problem here is that 
*inodeFile.getFileWithSnapshotFeature().isCurrentFileDeleted()* returns *true* 
as long as the *DELETE* operation that removed the original file is younger 
than last fsimage file. Once checkpoint happens and NN is restarted, 
*inodeFile.getFileWithSnapshotFeature().isCurrentFileDeleted()* then returns 
*false* for all snapshot files, breaking the logic above. I believe it's 
possible to set that flag from info available on fsimage during image loading 
time, but I'm not sure if that's the expected behaviour.

Truncate problem is harder. It generates new blocks (because it shrinks the 
file). So now we would need pass the above checks, but within *truncate*, 
*inodeFile.getFileWithSnapshotFeature().isCurrentFileDeleted()* will never 
return *true* (cause there were not any deletion). We could obviously make this 
flag updated by truncate, but it does not seem right.

These problems actually make me feel it would be simpler if we could have a way 
to get all related file paths for a given block from block manager, so that 
while in snapshot check we could only count blocks for those paths that are 
actually in snapshots. I don't know if we have any way to do that now, have 
found *BlockManager.getStoredBlock(Block)*, but returned *BlockInfo* instance 
only has id for the last inode in the file path, which in this case is the same 
for both real path and snapshot path. 

> fsck -includeSnapshots reports wrong amount of total blocks
> -----------------------------------------------------------
>
>                 Key: HDFS-12618
>                 URL: https://issues.apache.org/jira/browse/HDFS-12618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 3.0.0-alpha3
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HDFS-121618.initial, HDFS-12618.001.patch, 
> HDFS-12618.002.patch, HDFS-12618.003.patch
>
>
> When snapshot is enabled, if a file is deleted but is contained by a 
> snapshot, *fsck* will not reported blocks for such file, showing different 
> number of *total blocks* than what is exposed in the Web UI. 
> This should be fine, as *fsck* provides *-includeSnapshots* option. The 
> problem is that *-includeSnapshots* option causes *fsck* to count blocks for 
> every occurrence of a file on snapshots, which is wrong because these blocks 
> should be counted only once (for instance, if a 100MB file is present on 3 
> snapshots, it would still map to one block only in hdfs). This causes fsck to 
> report much more blocks than what actually exist in hdfs and is reported in 
> the Web UI.
> Here's an example:
> 1) HDFS has two files of 2 blocks each:
> {noformat}
> $ hdfs dfs -ls -R /
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 /snap-test
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 /snap-test/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 /snap-test/file2
> drwxr-xr-x   - root supergroup          0 2017-05-13 13:03 /test
> {noformat} 
> 2) There are two snapshots, with the two files present on each of the 
> snapshots:
> {noformat}
> $ hdfs dfs -ls -R /snap-test/.snapshot
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 
> /snap-test/.snapshot/snap1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap1/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap1/file2
> drwxr-xr-x   - root supergroup          0 2017-10-07 21:21 
> /snap-test/.snapshot/snap2
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:16 
> /snap-test/.snapshot/snap2/file1
> -rw-r--r--   1 root supergroup  209715200 2017-10-07 20:17 
> /snap-test/.snapshot/snap2/file2
> {noformat}
> 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the 
> normal file path, plus 4 blocks for each snapshot path):
> {noformat}
> $ hdfs fsck / -includeSnapshots
> FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 
> 15:15:36 BST 2017
> Status: HEALTHY
>  Number of data-nodes:        1
>  Number of racks:             1
>  Total dirs:                  6
>  Total symlinks:              0
> Replicated Blocks:
>  Total size:  1258291200 B
>  Total files: 6
>  Total blocks (validated):    12 (avg. block size 104857600 B)
>  Minimally replicated blocks: 12 (100.0 %)
>  Over-replicated blocks:      0 (0.0 %)
>  Under-replicated blocks:     0 (0.0 %)
>  Mis-replicated blocks:               0 (0.0 %)
>  Default replication factor:  1
>  Average block replication:   1.0
>  Missing blocks:              0
>  Corrupt blocks:              0
>  Missing replicas:            0 (0.0 %)
> {noformat}
> 4) Web UI shows the correct number (4 blocks only):
> {noformat}
> Security is off.
> Safemode is off.
> 5 files and directories, 4 blocks = 9 total filesystem object(s).
> {noformat}
> I would like to work on this solution, will propose an initial solution 
> shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks

Reply via email to