Lukas Majercak created HDFS-11499: ------------------------------------- Summary: Decommissioning stuck because of failing recovery Key: HDFS-11499 URL: https://issues.apache.org/jira/browse/HDFS-11499 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Affects Versions: 3.0.0-alpha2, 2.7.3, 2.7.2, 2.7.1 Reporter: Lukas Majercak Assignee: Lukas Majercak
Block recovery will fail to finalize the file if the locations of the last, incomplete block are being decommissioned. Vice versa, the decommissioning will be stuck, waiting for the last block to be completed. {code:xml} org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Failed to finalize INodeFile testRecoveryFile since blocks[255] is non-complete, where blocks=[blk_1073741825_1001, blk_1073741826_1002... {code} The fix is to count replicas on decommissioning nodes when completing last block in BlockManager.commitOrCompleteLastBlock, as we know that the DecommissionManager will not decommission a node that has UC blocks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org