zhouyingchao created HDFS-10989:
-----------------------------------

             Summary: Cannot get last block length after namenode failover
                 Key: HDFS-10989
                 URL: https://issues.apache.org/jira/browse/HDFS-10989
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: zhouyingchao


On a 2.4 cluster, access to a file failed since the last block length cannot be 
gotten.  The fsck output of the file at the moment of failure was like this:
/user/XXXXXXXXX 483600487 bytes, 2 block(s), OPENFORWRITE:  MISSING 1 blocks of 
total size 215165031 B
0. BP-219149063-10.108.84.25-1446859315800:blk_2102504098_1035525341 
len=268435456 repl=3 [10.112.17.43:11402, 10.118.22.46:11402, 
10.118.22.49:11402]
1. 
BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054{blockUCState=UNDER_RECOVERY,
 primaryNodeIndex=2, 
replicas=[ReplicaUnderConstruction[[DISK]DS-60be75ad-e4a7-4b1e-b3aa-327c85331d42:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-184a1ce9-655a-4e67-b0cc-29ab9984bd0a:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-6d037ac8-4bcc-4cdc-a803-55b1817e0200:NORMAL|RBW]]}
 len=215165031 MISSING!  Recorded locations [10.114.10.14:11402, 
10.118.29.3:11402, 10.118.22.42:11402]

>From those three data nodes, we found that there were IOException related to 
>the block and there were pipeline recreating events.

We figured out that there was a namenode failover event before the issue 
happened, and there were some updatePipeline calls to the earlier active 
namenode:
2016-09-27,15:04:36,437 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092,
 newGenerationStamp=1036170430, newLength=2624000, 
newNodes=[10.118.22.42:11402, 10.118.22.49:11402, 10.118.24.3:11402], 
clientName=DFSClient_NONMAPREDUCE_-442153643_1)
2016-09-27,15:04:36,438 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092)
 successfully to 
BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430
2016-09-27,15:10:10,596 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430,
 newGenerationStamp=1036219054, newLength=17138265, 
newNodes=[10.118.22.49:11402, 10.118.24.3:11402, 10.114.6.45:11402], 
clientName=DFSClient_NONMAPREDUCE_-442153643_1)
2016-09-27,15:10:10,601 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430)
 successfully to 
BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054

Whereas these new data nodes did not show up in the fsck output. It looks like 
that when data node recovers pipeline (PIPELINE_SETUP_STREAMING_RECOVERY ), the 
new data nodes would not call notifyNamingnodeReceivingBlock for the transfered 
block. 

>From code review, the issue also exists in more recent branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to