[jira] [Commented] (HDFS-11821) BlockManager.getMissingReplOneBlocksCount() does not report correct value if corrupt file with replication factor of 1 gets deleted

Wellington Chevreuil (JIRA) Fri, 08 Sep 2017 08:05:53 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158762#comment-16158762
 ]


Wellington Chevreuil commented on HDFS-11821:
---------------------------------------------

Thanks for the review and relevant comments [~raviprak]! Regarding the tests, I 
have these passing locally. Also, apart from 
*hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080*, these are totally 
different from previous build. Also, I don't think the patch changes would 
influence pipeline/append/reads, as it's only concerning to metrics update. 
Pasting test output snippets from my local build:

{noformat}
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 85.39 sec - in 
org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140

Results :

Tests run: 14, Failures: 0, Errors: 0, Skipped: 0
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hdfs.TestFileAppendRestart
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.935 sec - in 
org.apache.hadoop.hdfs.TestFileAppendRestart

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
...
Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 121.347 sec - 
in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080

Results :

Tests run: 14, Failures: 0, Errors: 0, Skipped: 0
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.274 sec - in 
org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hdfs.TestLeaseRecoveryStriped
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.307 sec - in 
org.apache.hadoop.hdfs.TestLeaseRecoveryStriped

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 62.726 sec - 
in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure120

Results :

Tests run: 14, Failures: 0, Errors: 0, Skipped: 0
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hdfs.TestPipelines
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.92 sec - in 
org.apache.hadoop.hdfs.TestPipelines

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 81.293 sec - 
in org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070

Results :

Tests run: 14, Failures: 0, Errors: 0, Skipped: 0
...
-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hdfs.TestWriteReadStripedFile
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 73.311 sec - 
in org.apache.hadoop.hdfs.TestWriteReadStripedFile

Results :

Tests run: 17, Failures: 0, Errors: 0, Skipped: 0
{noformat} 

bq. My concern with your patch is that remove will now be a bit slower. I think 
I remember there used to be a time when deletes were holding up the lock for a 
long time. Kihwal Lee Do you have an objection?
I guess the major concern here is that *countNodes* method iterates over the 
block *StorageInfo* objects, checking the replica state on each storage to 
decide how is replication health. I suppose this is limited by the block 
replication factor, so it wouldn't be large loop. Is this a correct assumption 
or would there still be some other overheads that could impact delete 
performance?

bq. I'm also wondering what happens when the information returned by countNodes 
is inaccurate (i.e. HDFS hasn't yet realized that the block is corrupt)
In that case, I believe the block would not be on any of the priority queues 
from *LowRedundancyBlocks*. Call to *LowRedundancyBlocks.remove* would not find 
any block then, so no counters would be updated, what I think is the correct 
behaviour. What do you feel?

> BlockManager.getMissingReplOneBlocksCount() does not report correct value if 
> corrupt file with replication factor of 1 gets deleted
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11821
>                 URL: https://issues.apache.org/jira/browse/HDFS-11821
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.6.0, 3.0.0-alpha2
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HDFS-11821-1.patch, HDFS-11821-2.patch
>
>
> *BlockManager* keeps a separate metric for number of missing blocks with 
> replication factor of 1. This is returned by 
> *BlockManager.getMissingReplOneBlocksCount()* method currently, and that's 
> what is displayed on below attribute for *dfsadmin -report* (in below 
> example, there's one corrupt block that relates to a file with replication 
> factor of 1):
> {noformat}
> ...
> Missing blocks (with replication factor 1): 1
> ...
> {noformat}
> However, if the related file gets deleted, (for instance, using hdfs fsck 
> -delete option), this metric never gets updated, and *dfsadmin -report* will 
> keep reporting a missing block, even though the file does not exist anymore. 
> The only workaround available is to restart the NN, so that this metric will 
> be cleared.
> This can be easily reproduced by forcing a replication factor 1 file 
> corruption such as follows:
> 1) Put a file into hdfs with replication factor 1:
> {noformat}
> $ hdfs dfs -Ddfs.replication=1 -put test_corrupt /
> $ hdfs dfs -ls /
> -rw-r--r--   1 hdfs     supergroup         19 2017-05-10 09:21 /test_corrupt
> {noformat}
> 2) Find related block for the file and delete it from DN:
> {noformat}
> $ hdfs fsck /test_corrupt -files -blocks -locations
> ...
> /test_corrupt 19 bytes, 1 block(s):  OK
> 0. BP-782213640-172.31.113.82-1494420317936:blk_1073742742_1918 len=19 
> Live_repl=1 
> [DatanodeInfoWithStorage[172.31.112.178:20002,DS-a0dc0b30-a323-4087-8c36-26ffdfe44f46,DISK]]
> Status: HEALTHY
> ...
> $ find /dfs/dn/ -name blk_1073742742*
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742_1918.meta
> $ rm -rf 
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742
> $ rm -rf 
> /dfs/dn/current/BP-782213640-172.31.113.82-1494420317936/current/finalized/subdir0/subdir3/blk_1073742742_1918.meta
> {noformat}
> 3) Running fsck will report the corruption as expected:
> {noformat}
> $ hdfs fsck /test_corrupt -files -blocks -locations
> ...
> /test_corrupt 19 bytes, 1 block(s): 
> /test_corrupt: CORRUPT blockpool BP-782213640-172.31.113.82-1494420317936 
> block blk_1073742742
>  MISSING 1 blocks of total size 19 B
> ...
> Total blocks (validated):     1 (avg. block size 19 B)
>   ********************************
>   UNDER MIN REPL'D BLOCKS:    1 (100.0 %)
>   dfs.namenode.replication.min:       1
>   CORRUPT FILES:      1
>   MISSING BLOCKS:     1
>   MISSING SIZE:               19 B
>   CORRUPT BLOCKS:     1
> ...
> {noformat}
> 4) Same for *dfsadmin -report*
> {noformat}
> $ hdfs dfsadmin -report
> ...
> Under replicated blocks: 1
> Blocks with corrupt replicas: 0
> Missing blocks: 1
> Missing blocks (with replication factor 1): 1
> ...
> {noformat}
> 5) Running *fsck -delete* option does cause fsck to report correct 
> information about corrupt block, but dfsadmin still shows the corrupt block:
> {noformat}
> $ hdfs fsck /test_corrupt -delete
> ...
> $ hdfs fsck /
> ...
> The filesystem under path '/' is HEALTHY
> ...
> $ hdfs dfsadmin -report
> ...
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 1
> ...
> {noformat}
> The problem seems to be on *BlockManager.removeBlock()* method, which in turn 
> uses util class *LowRedundancyBlocks* that classifies blocks according to the 
> current replication level, including blocks currently marked as corrupt. 
> The related metric showed on *dfsadmin -report* for corrupt blocks with 
> replication factor 1 is tracked on this *LowRedundancyBlocks*. Whenever a 
> block is marked as corrupt and it has replication factor of 1, the related 
> metric is updated. When removing the block, though, 
> *BlockManager.removeBlock()* is calling *LowRedundancyBlocks.remove(BlockInfo 
> block, int priLevel)*, which does not check if the given block was previously 
> marked as corrupt and had replication factor 1, which would require for 
> updating the metric.
> Am shortly proposing a patch that seems to fix this by making 
> *BlockManager.removeBlock()*  call *LowRedundancyBlocks.remove(BlockInfo 
> block, int oldReplicas, int oldReadOnlyReplicas, int outOfServiceReplicas, 
> int oldExpectedReplicas)* instead, which does update the metric properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-11821) BlockManager.getMissingReplOneBlocksCount() does not report correct value if corrupt file with replication factor of 1 gets deleted

Reply via email to