[
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351652#comment-17351652
]
Stephen O'Donnell commented on HDFS-13671:
------------------------------------------
Did anyone spend any time trying to understand what is going wrong with the
FoldedTreeSet structure to cause it to become so slow? In my experience, it
works well for some time, and then it suddenly starts performing poorly. The
only way to recover from that is to restart the service.
I have seen it happen in the namenodes often, but I have also seen the same
behaviour on datanodes which have been up for a long time.
I have not checked the patch in detail, but I suspect there is a bug in
FsDatasetImpl.getSortedFinalizedBlocks
{code}
public List<ReplicaInfo> getSortedFinalizedBlocks(String bpid) {
try (AutoCloseableLock lock = datasetReadLock.acquire()) {
final List<ReplicaInfo> finalized = new ArrayList<ReplicaInfo>(
volumeMap.size(bpid));
for (ReplicaInfo b : volumeMap.replicas(bpid)) {
if (b.getState() == ReplicaState.FINALIZED) {
finalized.add(b);
}
}
return finalized;
}
}
{code}
The lightWeightGset is not a sorted structure like FoldedTreeSet was, so this
method will no longer return the blocks sorted. The DirectoryScanner uses this
method and needs the blocks to be sorted. The DirectoryScanner tests did not
catch this, but TestFsDatasetImpl.testSortedFinalizedBlocksAreSorted did.
> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> ----------------------------------------------------------------------
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.1.0, 3.0.3
> Reporter: Yiqun Lin
> Assignee: Haibin Huang
> Priority: Major
> Attachments: HDFS-13671-001.patch
>
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0
> tid=0x00007fb505b27800 nid=0x94c3 runnable [0x00007fa861361000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes
> more time. However, now we always see NN hangs during the remove block
> operation.
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a
> better performance in dealing FBR/IBRs. But compared with early
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower
> since It will take additional time to balance tree node. When there are large
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the
> {{getBlockIterator}} to return blocks iterator and no other get operation
> with specified block. Still we need to use {{FoldedTreeSet}} in
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not
> Update. Maybe we can revert this to the early implementation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]