[
https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882329#comment-16882329
]
Hudson commented on HDFS-12703:
-------------------------------
FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16882 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/16882/])
HDFS-12703. Exceptions are fatal to decommissioning monitor. Contributed
(inigoiri: rev eccc9a40deda212cb367627f6f4cc35f5c619941)
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java
* (edit)
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java
> Exceptions are fatal to decommissioning monitor
> -----------------------------------------------
>
> Key: HDFS-12703
> URL: https://issues.apache.org/jira/browse/HDFS-12703
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.7.0
> Reporter: Daryn Sharp
> Assignee: He Xiaoqiao
> Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-12703.001.patch, HDFS-12703.002.patch,
> HDFS-12703.003.patch, HDFS-12703.004.patch, HDFS-12703.005.patch,
> HDFS-12703.006.patch, HDFS-12703.007.patch, HDFS-12703.008.patch,
> HDFS-12703.009.patch, HDFS-12703.010.patch, HDFS-12703.011.patch,
> HDFS-12703.012.patch, HDFS-12703.013.patch
>
>
> The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If
> an exception occurs, all decommissioning ceases until the NN is restarted.
> Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the
> task encounters an exception, subsequent executions are suppressed*. The
> monitor thread is alive but blocked waiting for an executor task that will
> never come. The code currently disposes of the future so the actual
> exception that aborted the task is gone.
> Failover is insufficient since the task is also likely dead on the standby.
> Replication queue init after the transition to active will fix the under
> replication of blocks on currently decommissioning nodes but future nodes
> never decommission. The standby must be bounced prior to failover – and
> hopefully the error condition does not reoccur.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]