[jira] [Commented] (HDFS-2932) Under replicated block after the pipeline recovery.

VinayaKumar B (Commented) (JIRA) Wed, 21 Mar 2012 20:20:28 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235301#comment-13235301
 ]


VinayaKumar B commented on HDFS-2932:
-------------------------------------

Scenario:
 1. Client is writing to a Pipeline DN1--> DN2 -->DN3. Block id Ex; b1k_1_1001
 2. DN3 is stopped in between. Now pipeline recovery happens and block id is 
changed to b1k_1_1002.
 3. write is complete, and stream is closed.
 4. DN3 is restarted.

*Issue Case 1: DN3 coming back after file is closed.*
----------------------------------------------------
>> Now DN3 will send the block reports to NN, which contains b1k_1_1001 report 
>> in RBW state.
>> by this time, Since the file is closed, NN will mark this as replica as 
>> corrupt.
>> Now Replication will not succeed since It cannot find one more datanode.

*Issue Case 2: DN3 coming back before the file closure.*
------------------------------------------------------
>> Now DN3 will send the block reports to NN, which contains b1k_1_1001 report 
>> in RBW state. but by this time file is not closed, then this DN is just 
>> added to targets array.
>> Replication request sent to Other DN (Ex DN2) to replicate this block to DN3.
>> Now DN3 will refuse the Replication throwing ReplicaAlreadyExistsException. 
>> because while checking for the existence of the Block, generation stamp is 
>> not considered.
        {noformat}2012-03-22 08:30:39,406 ERROR datanode.DataNode 
(DataXceiver.java:run(171)) - 127.0.0.1:59082:DataXceiver error processing 
WRITE_BLOCK operation  src: /127.0.0.1:59124 dest: /127.0.0.1:59082
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
BP-1348337625-169.254.103.145-1332385233856:blk_-4842149393874243436_1003 
already exists in state RWR and thus cannot be created.
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1740)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:151)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:340)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:167)
        at java.lang.Thread.run(Unknown Source){noformat}


*Basic Queries..?*
 1. Why while comparing the Block, Generationstamp is not considered...? 
       This behaviour is different compare to version 1.0
    
                
> Under replicated block after the pipeline recovery.
> ---------------------------------------------------
>
>                 Key: HDFS-2932
>                 URL: https://issues.apache.org/jira/browse/HDFS-2932
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.24.0
>            Reporter: J.Andreina
>             Fix For: 0.24.0
>
>
> Started 1NN,DN1,DN2,DN3 in the same machine.
> Written a huge file of size 2 Gb
> while the write for the block-id-1005 is in progress bruought down DN3.
> after the pipeline recovery happened.Block stamp changed into block_id_1006 
> in DN1,Dn2.
> after the write is over.DN3 is brought up and fsck command is issued.
> the following mess is displayed as follows
> "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2932) Under replicated block after the pipeline recovery.

Reply via email to