[
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16993123#comment-16993123
]
Hudson commented on HDFS-14854:
-------------------------------
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17749 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/17749/])
HDFS-14854. Create improved decommission monitor implementation. (weichiu: rev
c93cb6790e0f1c64efd03d859f907a0522010894)
* (add)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* (add)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithStripedBackoffMonitor.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithStriped.java
* (add)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java
* (add)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorBase.java
* (add)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatusWithBackoffMonitor.java
* (add)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithBackoffMonitor.java
* (add)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorInterface.java
> Create improved decommission monitor implementation
> ---------------------------------------------------
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 3.3.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Fix For: 3.3.0
>
> Attachments: 012_to_013_changes.diff,
> Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, HDFS-14854.002.patch,
> HDFS-14854.003.patch, HDFS-14854.004.patch, HDFS-14854.005.patch,
> HDFS-14854.006.patch, HDFS-14854.007.patch, HDFS-14854.008.patch,
> HDFS-14854.009.patch, HDFS-14854.010.patch, HDFS-14854.011.patch,
> HDFS-14854.012.patch, HDFS-14854.013.patch, HDFS-14854.014.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current
> decommission monitor implementation, such as:
> * Blocks are replicated sequentially disk by disk and node by node, and
> hence the load is not spread well across the cluster
> * Adding a node for decommission can cause the namenode write lock to be
> held for a long time.
> * Decommissioning nodes floods the replication queue and under replicated
> blocks from a future node or disk failure may way for a long time before they
> are replicated.
> * Blocks pending replication are checked many times under a write lock
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission
> monitor that resolves these issues. As it will be difficult to prove one
> implementation is better than another, the new implementation can be enabled
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1
> patch shortly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]