[jira] [Commented] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered

Arpit Agarwal (JIRA) Wed, 10 May 2017 16:12:29 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005613#comment-16005613
 ]


Arpit Agarwal commented on HDFS-11674:
--------------------------------------

+1 for the patch. I am not clear on one thing in the test case:
{code}
    /*
     * Reset the pipeline for the append in such a way that, datanode which is
     * down is one of the mirror, not the first datanode.
     */
    HdfsBlockLocation blockLocation = (HdfsBlockLocation) fs.getClient()
        .getBlockLocations(file.toString(), 0, BLOCK_SIZE)[0];
    LocatedBlock lastBlock = blockLocation.getLocatedBlock();
    // stop 3rd node.
    cluster.stopDataNode(lastBlock.getLocations()[2].getName());
{code}
Could you please clarify how this part works? getBlockLocations sorts the 
blocks by network distance from the caller, randomizing replicas at the same 
distance. So {{lastBlock.getLocations()\[2\]}} may be the first replica in the 
pipeline some times.

> reserveSpaceForReplicas is not released if append request failed due to 
> mirror down and replica recovered
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11674
>                 URL: https://issues.apache.org/jira/browse/HDFS-11674
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>            Priority: Critical
>              Labels: release-blocker
>         Attachments: HDFS-11674-01.patch, HDFS-11674-02.patch
>
>
> Scenario:
> 1. 3 Node cluster with 
> "dfs.client.block.write.replace-datanode-on-failure.policy"  as DEFAULT
> Block is written with x data.
> 2. One of the Datanode, NOT the first DN, is down
> 3. Client tries to append data to block and fails since one DN is down.
> 4. calls recoverLease() on the file.
> 5. Successfull recovery happens.
> Issue:
> 1. DNs which were connected from client before encountering mirror down, will 
> have the reservedSpaceForReplicas incremented, BUT never decremented. 
> 2. So in long run DN's all space will be in reservedSpaceForReplicas 
> resulting OutOfSpace errors.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-11674) reserveSpaceForReplicas is not released if append request failed due to mirror down and replica recovered

Reply via email to