[
https://issues.apache.org/jira/browse/HDFS-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Junping Du updated HDFS-8776:
-----------------------------
Target Version/s: (was: 2.8.0)
> Decom manager should not be active on standby
> ---------------------------------------------
>
> Key: HDFS-8776
> URL: https://issues.apache.org/jira/browse/HDFS-8776
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.6.0
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
>
> The decommission manager should not be actively processing on the standby.
> The decomm manager goes through the costly computation for determining every
> block on the node requires replication yet doesn't queue them for replication
> - because it's in standby. The decomm manager is holding the namesystem write
> lock, causing DNs to timeout on heartbeats or IBRs, NN purges the call queue
> of timed out clients, NN processes some heartbeats/IBRs before the decomm
> manager locks up the namesystem again. Nodes attempting to register will be
> sending full BRs which are more costly to send and discard than a heartbeat.
> If a failover is required, the standby will likely have to struggle very hard
> to not GC while "catching up" on its queued IBRs while DNs continue to fill
> the call queue and time out.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]