HDFS should recover when  replicas of block have different sizes (due to 
corrupted block)
-----------------------------------------------------------------------------------------

                 Key: HADOOP-2890
                 URL: https://issues.apache.org/jira/browse/HADOOP-2890
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.16.0
            Reporter: lohit vijayarenu


We had a case where reading a file caused IOException.
08/02/25 17:23:02 INFO fs.DFSClient: Could not obtain block 
blk_-8333897631311887285 from any node:  java.io.IOException: No live nodes 
contain current block

hadoop fsck said the block was healthy.
[lohit]$ hadoop fsck part-04344 -files -blocks -locations | grep 
8333897631311887285
21. -8333897631311887285 len=134217728 repl=3 [74.6.129.238:50010, 
74.6.133.231:50010, 74.6.128.158:50010]

Looking for logs about the block showed this message in namenode log
17:26:23,543 WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent size for 
block blk_-8333897631311887285 reported from 74.6.133.231:50010 current size is 
134217728 reported size is 134205440

So, the namenode was expecting 134217728 while the actual block size was 
134205440

Dhruba took a look at the logs further and we found out this is what had happend
1. While the file was being created this block was replicated to three nodes of 
which 2 nodes had correct sized block, but the third node has partial/truncated 
block. (but the metadata was same on all nodes)
2. Later after 3 days namenode was restarted, at which point the 3rd node 
reported warning message about incorrect block size. (Namenode logged this)
3. After few days the first 2 nodes went down and the 3rd node replicated the 
partial/truncated block to two new nodes. 
4. Now when we tried to read this block, we hit the IOException
5. On all the nodes, the metadata corresponded to the original valid block 
while the block itself was missing around 12K of data.

Two problems which could be fixed here
1. When namenode identifies replicas with different blocksize (point 2 above). 
It could choose the biggest block and discard the small block. If the block is 
not the last block, then its size has to be equal to the block size, anything 
less than that could be considered bad block.
2. Datanode Block periodic verifier could also verify that the metadata has the 
correct size as that of the actual block present. Any changes should be 
reported/recovered considering what would be done in above step.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to