[
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511575#comment-17511575
]
tomscut commented on HDFS-13671:
--------------------------------
Hi [~max2049] , we are still using CMS on a cluster without EC data, some
parameter adjustment should be able to solve this problem.
And how long is your FBR period? If it is 6 hours(default) and the cluster size
is large, it may have an impact on GC. We set this to 3 days.
We use G1GC on a cluster with this feature that uses EC data. The main
parameters(open JDK 1.8) are as follows:
{code:java}
-server -Xmx200g -Xms200g
-XX:MaxDirectMemorySize=2g
-XX:MaxMetaspaceSize=2g
-XX:MetaspaceSize=1g
-XX:+UseG1GC -XX:+UnlockExperimentalVMOptions
-XX:InitiatingHeapOccupancyPercent=75
-XX:G1NewSizePercent=0 -XX:G1MaxNewSizePercent=3
-XX:SurvivorRatio=2 -XX:+DisableExplicitGC -XX:MaxTenuringThreshold=15
-XX:-UseBiasedLocking -XX:ParallelGCThreads=40 -XX:ConcGCThreads=20
-XX:MaxJavaStackTraceDepth=1000000 -XX:MaxGCPauseMillis=200
-verbose:gc -XX:+UnlockDiagnosticVMOptions -XX:+PrintGCDetails
-XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCCause -XX:+PrintGCDateStamps
-XX:+PrintReferenceGC -XX:+PrintHeapAtGC -XX:+PrintAdaptiveSizePolicy
-XX:+G1PrintHeapRegions -XX:+PrintTenuringDistribution
-Xloggc:/data1/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'`" {code}
> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> ----------------------------------------------------------------------
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.1.0, 3.0.3
> Reporter: Yiqun Lin
> Assignee: Haibin Huang
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png,
> image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png,
> image-2021-06-18-15-47-04-037.png
>
> Time Spent: 7h 40m
> Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0
> tid=0x00007fb505b27800 nid=0x94c3 runnable [0x00007fa861361000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
> at
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
> at
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes
> more time. However, now we always see NN hangs during the remove block
> operation.
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a
> better performance in dealing FBR/IBRs. But compared with early
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower
> since It will take additional time to balance tree node. When there are large
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the
> {{getBlockIterator}} to return blocks iterator and no other get operation
> with specified block. Still we need to use {{FoldedTreeSet}} in
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not
> Update. Maybe we can revert this to the early implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]