[
https://issues.apache.org/jira/browse/HDFS-12703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845252#comment-16845252
]
Xue Liu commented on HDFS-12703:
--------------------------------
Hi folks,
We observed this issue on our prod cluster recently, basically there are some
exceptions on DatanodeAdminMonitor thread that are unhandled, and this caused
decommission to stop. The thread execution is suppressed as the Jira described.
We are adding some error handling to catch the specific exception. Will update
what exception we have once running in prod.
I will provide a patch with exception handling, and if possible, fix the root
cause of the exception.
> Exceptions are fatal to decommissioning monitor
> -----------------------------------------------
>
> Key: HDFS-12703
> URL: https://issues.apache.org/jira/browse/HDFS-12703
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.7.0
> Reporter: Daryn Sharp
> Priority: Critical
>
> The {{DecommissionManager.Monitor}} runs as an executor scheduled task. If
> an exception occurs, all decommissioning ceases until the NN is restarted.
> Per javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the
> task encounters an exception, subsequent executions are suppressed*. The
> monitor thread is alive but blocked waiting for an executor task that will
> never come. The code currently disposes of the future so the actual
> exception that aborted the task is gone.
> Failover is insufficient since the task is also likely dead on the standby.
> Replication queue init after the transition to active will fix the under
> replication of blocks on currently decommissioning nodes but future nodes
> never decommission. The standby must be bounced prior to failover – and
> hopefully the error condition does not reoccur.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]