[
https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235301#comment-13235301
]
VinayaKumar B commented on HDFS-2932:
-------------------------------------
Scenario:
1. Client is writing to a Pipeline DN1--> DN2 -->DN3. Block id Ex; b1k_1_1001
2. DN3 is stopped in between. Now pipeline recovery happens and block id is
changed to b1k_1_1002.
3. write is complete, and stream is closed.
4. DN3 is restarted.
*Issue Case 1: DN3 coming back after file is closed.*
----------------------------------------------------
>> Now DN3 will send the block reports to NN, which contains b1k_1_1001 report
>> in RBW state.
>> by this time, Since the file is closed, NN will mark this as replica as
>> corrupt.
>> Now Replication will not succeed since It cannot find one more datanode.
*Issue Case 2: DN3 coming back before the file closure.*
------------------------------------------------------
>> Now DN3 will send the block reports to NN, which contains b1k_1_1001 report
>> in RBW state. but by this time file is not closed, then this DN is just
>> added to targets array.
>> Replication request sent to Other DN (Ex DN2) to replicate this block to DN3.
>> Now DN3 will refuse the Replication throwing ReplicaAlreadyExistsException.
>> because while checking for the existence of the Block, generation stamp is
>> not considered.
{noformat}2012-03-22 08:30:39,406 ERROR datanode.DataNode
(DataXceiver.java:run(171)) - 127.0.0.1:59082:DataXceiver error processing
WRITE_BLOCK operation src: /127.0.0.1:59124 dest: /127.0.0.1:59082
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
BP-1348337625-169.254.103.145-1332385233856:blk_-4842149393874243436_1003
already exists in state RWR and thus cannot be created.
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1740)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:151)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:340)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:167)
at java.lang.Thread.run(Unknown Source){noformat}
*Basic Queries..?*
1. Why while comparing the Block, Generationstamp is not considered...?
This behaviour is different compare to version 1.0
> Under replicated block after the pipeline recovery.
> ---------------------------------------------------
>
> Key: HDFS-2932
> URL: https://issues.apache.org/jira/browse/HDFS-2932
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node
> Affects Versions: 0.24.0
> Reporter: J.Andreina
> Fix For: 0.24.0
>
>
> Started 1NN,DN1,DN2,DN3 in the same machine.
> Written a huge file of size 2 Gb
> while the write for the block-id-1005 is in progress bruought down DN3.
> after the pipeline recovery happened.Block stamp changed into block_id_1006
> in DN1,Dn2.
> after the write is over.DN3 is brought up and fsck command is issued.
> the following mess is displayed as follows
> "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas".
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira