ZhiWei Shi created HDFS-16574:
---------------------------------
Summary: Reduces the time it takes once to hold FSNamesystem write
lock to remove blocks associated with dead data nodes
Key: HDFS-16574
URL: https://issues.apache.org/jira/browse/HDFS-16574
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Reporter: ZhiWei Shi
Assignee: ZhiWei Shi
In {{{}BlockManager.{}}}removeBlocksAssociatedTo(final DatanodeStorageInfo
storageInfo)
{code:java}
/** Remove the blocks associated to the given datanode. */
void removeBlocksAssociatedTo(final DatanodeDescriptor node) {
providedStorageMap.removeDatanode(node);
final Iterator<BlockInfo> it = node.getBlockIterator();
while(it.hasNext()) {
removeStoredBlock(it.next(), node);
}
// Remove all pending DN messages referencing this DN.
pendingDNMessages.removeAllMessagesForDatanode(node);
node.resetBlocks();
invalidateBlocks.remove(node);
}
{code}
it holds FSNamesystem write lock to remove all blocks associated with a dead
data node,which can take a lot of time when the datanode has a large number of
blocks. eg :49.2s for 22.30M blocks,7.9s for 2.84M blocks.
{code:java}
# ${ip-1} has 22304612 blocks
2022-05-08 13:43:35,864 INFO org.apache.hadoop.net.NetworkTopology: Removing a
node: /default-rack/${ip-1}:800
2022-05-08 13:43:35,864 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock
held for 49184 ms via
java.lang.Thread.getStackTrace(Thread.java:1552)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1579)
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:409)
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
java.lang.Thread.run(Thread.java:745)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 49184
# ${ip-2} has 22304612 blocks
2022-05-08 08:11:55,559 INFO org.apache.hadoop.net.NetworkTopology: Removing a
node: /default-rack/${ip-2}:800
2022-05-08 08:11:55,560 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock
held for 7925 ms via
java.lang.Thread.getStackTrace(Thread.java:1552)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1021)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1579)
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:409)
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
java.lang.Thread.run(Thread.java:745)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 7925{code}
This whill block all RPC requests that require FSNamesystem lock,and cannot
process health check requests in time,which eventually leads to namenode
failover
{code:java}
2022-05-08 13:43:32,279 WARN org.apache.hadoop.ha.HealthMonitor:
Transport-level exception trying to monitor health of NameNode at
hd044.corp.yodao.com/10.108.162.60:8000: java.net.SocketTimeoutException: 45000
millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.108.162.60:16861
remote=hd044.corp.yodao.com/10.108.162.60:8000] Call From
hd044.corp.yodao.com/10.108.162.60 to hd044.corp.yodao.com:8000 failed on
socket timeout exception: java.net.SocketTimeoutException: 45000 millis timeout
while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.108.162.60:16861
remote=hd044.corp.yodao.com/10.108.162.60:8000]; For more details see:
http://wiki.apache.org/hadoop/SocketTimeout
2022-05-08 13:43:32,283 INFO org.apache.hadoop.ha.HealthMonitor: Entering state
SERVICE_NOT_RESPONDING
2022-05-08 13:43:32,283 INFO org.apache.hadoop.ha.ZKFailoverController: Local
service NameNode at hd044.corp.yodao.com/10.108.162.60:8000 entered state:
SERVICE_NOT_RESPONDING
2022-05-08 13:43:32,922 INFO
org.apache.hadoop.hdfs.tools.DFSZKFailoverController: -- Local NN thread dump --
Process Thread Dump:
......
Thread 100 (IPC Server handler 0 on 8000):
State: WAITING
Blocked count: 14895292
Waited count: 210385351
Waiting on java.util.concurrent.locks.ReentrantReadWriteLock$FairSync@5723ab57
Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readLock(FSNamesystemLock.java:144)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readLock(FSNamesystem.java:1556)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1842)
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:707)
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
java.security.AccessController.doPrivileged(Native Method)
javax.security.auth.Subject.doAs(Subject.java:422)
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
......
Thread 41
(org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@7c351808):
State: RUNNABLE
Blocked count: 10
Waited count: 720146
Stack:
org.apache.hadoop.hdfs.util.LightWeightHashSet.resize(LightWeightHashSet.java:479)
org.apache.hadoop.hdfs.util.LightWeightHashSet.expandIfNecessary(LightWeightHashSet.java:497)
org.apache.hadoop.hdfs.util.LightWeightHashSet.add(LightWeightHashSet.java:244)
org.apache.hadoop.hdfs.server.blockmanagement.UnderReplicatedBlocks.update(UnderReplicatedBlocks.java:312)
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.updateNeededReplications(BlockManager.java:3718)
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeStoredBlock(BlockManager.java:3310)
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlocksAssociatedTo(BlockManager.java:1314)
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDatanode(DatanodeManager.java:638)
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.removeDeadDatanode(DatanodeManager.java:683)
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager.heartbeatCheck(HeartbeatManager.java:407)
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor.run(HeartbeatManager.java:436)
java.lang.Thread.run(Thread.java:745)
.....
-- Local NN thread dump --
2022-05-08 13:43:32,923 INFO org.apache.hadoop.ha.ZKFailoverController:
Quitting master election for NameNode at
hd044.corp.yodao.com/10.108.162.60:8000 and marking that fencing is necessary
2022-05-08 13:43:32,923 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Yielding from election
2022-05-08 13:43:32,925 INFO org.apache.zookeeper.ZooKeeper: Session:
0x97e5d7aac530e03 closed
2022-05-08 13:43:32,925 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Ignoring stale result from old client with sessionId 0x97e5d7aac530e03
2022-05-08 13:43:32,926 INFO org.apache.zookeeper.ClientCnxn: EventThread shut
down
2022-05-08 13:43:36,186 INFO org.apache.hadoop.ha.HealthMonitor: Entering state
SERVICE_HEALTHY{code}
After it has processed some blocks or consumed some time, it should release the
FSNAME system lock for a period of time so that other Rpc requests can be
processed in time
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]