Brahma Reddy Battula created HDFS-3584: ------------------------------------------
Summary: Blocks are getting marked as corrupt with append operation under high load. Key: HDFS-3584 URL: https://issues.apache.org/jira/browse/HDFS-3584 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.1-alpha Reporter: Brahma Reddy Battula Scenario: ========= 1. There are 2 clients cli1 and cli2 cli1 write a file F1 and not closed 2. The cli2 will call append on unclosed file and triggers a leaserecovery 3. Cli1 is closed 4. Lease recovery is completed and with updated GS in DN and got BlockReport since there is a mismatch in GS the block got corrupted 5. Now we got a CommitBlockSync this will also fail since the File is already closed by cli1 and state in NN is Finalized *please check following log for blk_-7909104799008701972* {noformat} 2012-06-25 13:48:59,603 INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_-7909104799008701972 added as corrupt on ****DN1:50010 by /****DN1 because block is COMPLETE and reported genstamp 96470 does not match genstamp in block map 96309 2012-06-25 13:48:59,604 DEBUG org.apache.hadoop.hdfs.StateChange: UnderReplicationBlocks.update blk_-7909104799008701972_96309 curReplicas 2 curExpectedReplicas 3 oldReplicas 3 oldExpectedReplicas 3 curPri 2 oldPri 3 2012-06-25 13:48:59,604 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.UnderReplicationBlock.update:blk_-7909104799008701972_96309 has only 2 replicas and needs 3 replicas so is added to neededReplications at priority level 2 2012-06-25 13:48:59,604 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* block RECEIVED_BLOCK: blk_-7909104799008701972_96470 is received from DatanodeRegistration(****DN1, storageID=DS-1986831640-****DN1-50010-1340363042399, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-fdfc6cef-05b1-4900-b5f9-cc275dfd343c;nsid=415242063;c=0) 2012-06-25 13:48:59,607 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_-7909104799008701972_96470 on ****DN2:50010 size 524288 replicaState = FINALIZED 2012-06-25 13:48:59,608 INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_-7909104799008701972 added as corrupt on ****DN2:50010 by /****DN2 because block is COMPLETE and reported genstamp 96470 does not match genstamp in block map 96309 2012-06-25 13:48:59,608 DEBUG org.apache.hadoop.hdfs.StateChange: UnderReplicationBlocks.update blk_-7909104799008701972_96309 curReplicas 1 curExpectedReplicas 3 oldReplicas 2 oldExpectedReplicas 3 curPri 0 oldPri 2 2012-06-25 13:48:59,609 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block blk_-7909104799008701972_96309 from priority queue 2 2012-06-25 13:48:59,609 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.UnderReplicationBlock.update:blk_-7909104799008701972_96309 has only 1 replicas and needs 3 replicas so is added to neededReplications at priority level 0 2012-06-25 13:48:59,610 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* block RECEIVED_BLOCK: blk_-7909104799008701972_96470 is received from DatanodeRegistration(****DN2, storageID=DS-485536663-****DN2-50010-1340362102909, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-fdfc6cef-05b1-4900-b5f9-cc275dfd343c;nsid=415242063;c=0) 2012-06-25 13:48:59,678 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_-7909104799008701972_96470 on ****DN3:50010 size 524288 replicaState = FINALIZED 2012-06-25 13:48:59,679 INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.addToCorruptReplicasMap: blk_-7909104799008701972 added as corrupt on ****DN3:50010 by /****DN3 because block is COMPLETE and reported genstamp 96470 does not match genstamp in block map 96309 2012-06-25 13:48:59,681 DEBUG org.apache.hadoop.hdfs.StateChange: UnderReplicationBlocks.update blk_-7909104799008701972_96309 curReplicas 0 curExpectedReplicas 3 oldReplicas 1 oldExpectedReplicas 3 curPri 4 oldPri 0 2012-06-25 13:48:59,681 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.UnderReplicationBlock.remove: Removing block blk_-7909104799008701972_96309 from priority queue 0 2012-06-25 13:48:59,682 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.UnderReplicationBlock.update:blk_-7909104799008701972_96309 has only 0 replicas and needs 3 replicas so is added to neededReplications at priority level 4 2012-06-25 13:48:59,682 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* block RECEIVED_BLOCK: blk_-7909104799008701972_96470 is received from DatanodeRegistration(****DN3, storageID=DS-598160968-****DN3-50010-1340382093938, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-fdfc6cef-05b1-4900-b5f9-cc275dfd343c;nsid=415242063;c=0) 2012-06-25 13:48:59,683 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(lastblock=BP-1988075715-****DN1-1340361925673:blk_-7909104799008701972_96309, newgenerationstamp=96470, newlength=524288, newtargets=[****DN1:50010, ****DN2:50010, ****DN3:50010], closeFile=true, deleteBlock=false) 2012-06-25 13:48:59,683 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: Unexpected block (=BP-1988075715-****DN1-1340361925673:blk_-7909104799008701972_96309) since the file (=append_1654216706419640) is not under construction 2012-06-25 13:48:59,685 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 8020, call org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization from ****DN1:15704: error: java.io.IOException: Unexpected block (=BP-1988075715-****DN1-1340361925673:blk_-7909104799008701972_96309) since the file (=append_1654216706419640) is not under construction java.io.IOException: Unexpected block (=BP-1988075715-****DN1-1340361925673:blk_-7909104799008701972_96309) since the file (=append_1654216706419640) is not under construction 2012-06-25 13:49:00,413 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block blk_-7909104799008701972_96309 cannot be repl from any node {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira