[ 
https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-3091:
--------------------------------------

         Description: 
When verifying the HDFS-1606 feature, Observed couple of issues.

Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont 
have enough DN to replcae in cluster and will be resulted into write failure.

{quote}
12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Failed to add a datanode: nodes.length != original.length 
+ 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010]
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
{quote}


Lets take some cases:
1) Replication factor 3 and cluster size also 3 and unportunately pipeline 
drops to 1.

ReplaceDatanodeOnFailure will be satisfied because *existings(1)<= 
replication/2 (3/2==1)*.

But when it finding the new node to replace obiously it can not find the new 
node and the sanity check will fail.

This will be resulted to Wite failure.

2) Replication factor 10 (accidentally user sets the replication factor to 
higher value than cluster size),
  Cluser has only 5 datanodes.

  Here even if one node fails also write will fail with same reason.
  Because pipeline max will be 5 and killed one datanode, then existings will 
be 4

  *existings(4)<= replication/2(10/2==5)* will be satisfied and obiously it can 
not replace with the new node as there is no extra nodes exist in the cluster. 
This will be resulted to write failure.

3) sync realted opreations also fails in this situations ( will post the clear 
scenarios)

  was:
When verifying the HDFS-1606 feature, Observed couple of issues.

Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont 
have enough DN to replcae in cluster and will be resulted into write failure.

{quote}
12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Failed to add a datanode: nodes.length != original.length 
+ 1, nodes=[10.18.52.55:50010], original=[10.18.52.55:50010]
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
{quote}


Lets take some cases:
1) Replication factor 3 and cluster size also 3 and unportunately pipeline 
drops to 1.

ReplaceDatanodeOnFailure will be satisfied because *existings(1)<= 
replication/2 (3/2==1)*.

But when it finding the new node to replace obiously it can not find the new 
node and the sanity check will fail.

This will be resulted to Wite failure.

2) Replication factor 10 (accidentally user sets the replication factor to 
higher value than cluster size),
  Cluser has only 5 datanodes.

  Here even if one node fails also write will fail with same reason.
  Because pipeline max will be 5 and killed one datanode, then existings will 
be 4

  *existings(4)<= replication/2(10/2==5)* will be satisfied and obiously it can 
not replace with the new node as there is no extra nodes exist in the cluster. 
This will be resulted to write failure.

3) sync realted opreations also fails in this situations ( will post the clear 
scenarios)

    Target Version/s: 0.24.0, 0.23.3  (was: 0.23.3, 0.24.0)
    
> Failed to add new DataNode in pipeline and will be resulted into write 
> failure.
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-3091
>                 URL: https://issues.apache.org/jira/browse/HDFS-3091
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client, name-node
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Uma Maheswara Rao G
>
> When verifying the HDFS-1606 feature, Observed couple of issues.
> Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont 
> have enough DN to replcae in cluster and will be resulted into write failure.
> {quote}
> 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception
> java.io.IOException: Failed to add a datanode: nodes.length != 
> original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010]
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416)
> {quote}
> Lets take some cases:
> 1) Replication factor 3 and cluster size also 3 and unportunately pipeline 
> drops to 1.
> ReplaceDatanodeOnFailure will be satisfied because *existings(1)<= 
> replication/2 (3/2==1)*.
> But when it finding the new node to replace obiously it can not find the new 
> node and the sanity check will fail.
> This will be resulted to Wite failure.
> 2) Replication factor 10 (accidentally user sets the replication factor to 
> higher value than cluster size),
>   Cluser has only 5 datanodes.
>   Here even if one node fails also write will fail with same reason.
>   Because pipeline max will be 5 and killed one datanode, then existings will 
> be 4
>   *existings(4)<= replication/2(10/2==5)* will be satisfied and obiously it 
> can not replace with the new node as there is no extra nodes exist in the 
> cluster. This will be resulted to write failure.
> 3) sync realted opreations also fails in this situations ( will post the 
> clear scenarios)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to