[ 
https://issues.apache.org/jira/browse/HDFS-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896532#comment-15896532
 ] 

ASF GitHub Bot commented on HDFS-11499:
---------------------------------------

GitHub user lukmajercak opened a pull request:

    https://github.com/apache/hadoop/pull/199

    HDFS-11499 Decommissioning stuck because of failing recovery

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lukmajercak/hadoop HDFS-11499

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/199.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #199
    
----
commit 3609b1353e64a24dee4746b8fa23ed7547768d68
Author: Lukas Majercak <[email protected]>
Date:   2017-03-05T20:04:06Z

    HDFS-11499 add 
TestDecommission.testDecommissionWithOpenFileAndDatanodeFailing for testing 
recovery

commit 3f97d89f75d8a20f878da8c438141f9b6adf7da0
Author: Lukas Majercak <[email protected]>
Date:   2017-03-05T20:05:08Z

    HDFS-11499 count decommissioning replicas when completing last block in 
BlockManager.commitOrCompleteLastBlock

----


> Decommissioning stuck because of failing recovery
> -------------------------------------------------
>
>                 Key: HDFS-11499
>                 URL: https://issues.apache.org/jira/browse/HDFS-11499
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>    Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha2
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>
> Block recovery will fail to finalize the file if the locations of the last, 
> incomplete block are being decommissioned. Vice versa, the decommissioning 
> will be stuck, waiting for the last block to be completed.
> {code:xml}
> org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): 
> Failed to finalize INodeFile testRecoveryFile since blocks[255] is 
> non-complete, where blocks=[blk_1073741825_1001, blk_1073741826_1002...
> {code}
> The fix is to count replicas on decommissioning nodes when completing last 
> block in BlockManager.commitOrCompleteLastBlock, as we know that the 
> DecommissionManager will not decommission a node that has UC blocks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to