Brahma Reddy Battula created HDFS-3584:
------------------------------------------

             Summary: Blocks are getting marked as corrupt with append 
operation under high load.
                 Key: HDFS-3584
                 URL: https://issues.apache.org/jira/browse/HDFS-3584
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
    Affects Versions: 2.0.1-alpha
            Reporter: Brahma Reddy Battula


Scenario:
========= 
1. There are 2 clients cli1 and cli2 cli1 write a file F1 and not closed
2. The cli2 will call append on unclosed file and triggers a leaserecovery
3. Cli1 is closed
4. Lease recovery is completed and with updated GS in DN and got BlockReport 
since there is a mismatch in GS the block got corrupted
5. Now we got a CommitBlockSync this will also fail since the File is already 
closed by cli1 and state in NN is Finalized

 *please check following log for blk_-7909104799008701972* 



{noformat}
2012-06-25 13:48:59,603 INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_-7909104799008701972 added as corrupt 
on ****DN1:50010 by /****DN1 because block is COMPLETE and reported genstamp 
96470 does not match genstamp in block map 96309
2012-06-25 13:48:59,604 DEBUG org.apache.hadoop.hdfs.StateChange: 
UnderReplicationBlocks.update blk_-7909104799008701972_96309 curReplicas 2 
curExpectedReplicas 3 oldReplicas 3 oldExpectedReplicas  3 curPri  2 oldPri  3
2012-06-25 13:48:59,604 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.UnderReplicationBlock.update:blk_-7909104799008701972_96309 has only 
2 replicas and needs 3 replicas so is added to neededReplications at priority 
level 2
2012-06-25 13:48:59,604 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* block 
RECEIVED_BLOCK: blk_-7909104799008701972_96470 is received from 
DatanodeRegistration(****DN1, 
storageID=DS-1986831640-****DN1-50010-1340363042399, infoPort=50075, 
ipcPort=50020, 
storageInfo=lv=-40;cid=CID-fdfc6cef-05b1-4900-b5f9-cc275dfd343c;nsid=415242063;c=0)
2012-06-25 13:48:59,607 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block 
blk_-7909104799008701972_96470 on ****DN2:50010 size 524288 replicaState = 
FINALIZED
2012-06-25 13:48:59,608 INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_-7909104799008701972 added as corrupt 
on ****DN2:50010 by /****DN2 because block is COMPLETE and reported genstamp 
96470 does not match genstamp in block map 96309
2012-06-25 13:48:59,608 DEBUG org.apache.hadoop.hdfs.StateChange: 
UnderReplicationBlocks.update blk_-7909104799008701972_96309 curReplicas 1 
curExpectedReplicas 3 oldReplicas 2 oldExpectedReplicas  3 curPri  0 oldPri  2
2012-06-25 13:48:59,609 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.UnderReplicationBlock.remove: Removing block 
blk_-7909104799008701972_96309 from priority queue 2
2012-06-25 13:48:59,609 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.UnderReplicationBlock.update:blk_-7909104799008701972_96309 has only 
1 replicas and needs 3 replicas so is added to neededReplications at priority 
level 0
2012-06-25 13:48:59,610 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* block 
RECEIVED_BLOCK: blk_-7909104799008701972_96470 is received from 
DatanodeRegistration(****DN2, 
storageID=DS-485536663-****DN2-50010-1340362102909, infoPort=50075, 
ipcPort=50020, 
storageInfo=lv=-40;cid=CID-fdfc6cef-05b1-4900-b5f9-cc275dfd343c;nsid=415242063;c=0)
2012-06-25 13:48:59,678 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block 
blk_-7909104799008701972_96470 on ****DN3:50010 size 524288 replicaState = 
FINALIZED
2012-06-25 13:48:59,679 INFO org.apache.hadoop.hdfs.StateChange: BLOCK 
NameSystem.addToCorruptReplicasMap: blk_-7909104799008701972 added as corrupt 
on ****DN3:50010 by /****DN3 because block is COMPLETE and reported genstamp 
96470 does not match genstamp in block map 96309
2012-06-25 13:48:59,681 DEBUG org.apache.hadoop.hdfs.StateChange: 
UnderReplicationBlocks.update blk_-7909104799008701972_96309 curReplicas 0 
curExpectedReplicas 3 oldReplicas 1 oldExpectedReplicas  3 curPri  4 oldPri  0
2012-06-25 13:48:59,681 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.UnderReplicationBlock.remove: Removing block 
blk_-7909104799008701972_96309 from priority queue 0
2012-06-25 13:48:59,682 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.UnderReplicationBlock.update:blk_-7909104799008701972_96309 has only 
0 replicas and needs 3 replicas so is added to neededReplications at priority 
level 4
2012-06-25 13:48:59,682 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* block 
RECEIVED_BLOCK: blk_-7909104799008701972_96470 is received from 
DatanodeRegistration(****DN3, 
storageID=DS-598160968-****DN3-50010-1340382093938, infoPort=50075, 
ipcPort=50020, 
storageInfo=lv=-40;cid=CID-fdfc6cef-05b1-4900-b5f9-cc275dfd343c;nsid=415242063;c=0)
2012-06-25 13:48:59,683 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
commitBlockSynchronization(lastblock=BP-1988075715-****DN1-1340361925673:blk_-7909104799008701972_96309,
 newgenerationstamp=96470, newlength=524288, newtargets=[****DN1:50010, 
****DN2:50010, ****DN3:50010], closeFile=true, deleteBlock=false)
2012-06-25 13:48:59,683 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:root (auth:SIMPLE) cause:java.io.IOException: 
Unexpected block 
(=BP-1988075715-****DN1-1340361925673:blk_-7909104799008701972_96309) since the 
file (=append_1654216706419640) is not under construction
2012-06-25 13:48:59,685 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 
on 8020, call 
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.commitBlockSynchronization
 from ****DN1:15704: error: java.io.IOException: Unexpected block 
(=BP-1988075715-****DN1-1340361925673:blk_-7909104799008701972_96309) since the 
file (=append_1654216706419640) is not under construction
java.io.IOException: Unexpected block 
(=BP-1988075715-****DN1-1340361925673:blk_-7909104799008701972_96309) since the 
file (=append_1654216706419640) is not under construction
2012-06-25 13:49:00,413 DEBUG 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block 
blk_-7909104799008701972_96309 cannot be repl from any node

{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to