[ https://issues.apache.org/jira/browse/HDFS-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Junping Du updated HDFS-8776: ----------------------------- Target Version/s: (was: 2.8.0) > Decom manager should not be active on standby > --------------------------------------------- > > Key: HDFS-8776 > URL: https://issues.apache.org/jira/browse/HDFS-8776 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > > The decommission manager should not be actively processing on the standby. > The decomm manager goes through the costly computation for determining every > block on the node requires replication yet doesn't queue them for replication > - because it's in standby. The decomm manager is holding the namesystem write > lock, causing DNs to timeout on heartbeats or IBRs, NN purges the call queue > of timed out clients, NN processes some heartbeats/IBRs before the decomm > manager locks up the namesystem again. Nodes attempting to register will be > sending full BRs which are more costly to send and discard than a heartbeat. > If a failover is required, the standby will likely have to struggle very hard > to not GC while "catching up" on its queued IBRs while DNs continue to fill > the call queue and time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org