Kihwal Lee created HDFS-8676:
--------------------------------
Summary: Delayed rolling upgrade finalization can cause heartbeat
expiration
Key: HDFS-8676
URL: https://issues.apache.org/jira/browse/HDFS-8676
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical
In big busy clusters where the deletion rate is also high, a lot of blocks can
pile up in the datanode trash directories until an upgrade is finalized. When
it is finally finalized, the deletion of trash is done in the service actor
thread's context synchronously. This blocks the heartbeat and can cause
heartbeat expiration.
We have seen a namenode losing hundreds of nodes after a delayed upgrade
finalization. The deletion of trash directories should be made asynchronous.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)