[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514638#comment-16514638
 ] 

Yiqun Lin commented on HDFS-13671:
----------------------------------

Thanks for the comments, everyone!
[[email protected]], we  did the GC check and are sure there is no GC problem 
when NN hung. And we looked into the NN log, that explicitly indicated that NN 
was doing the remove block operation. It lasted around 6 minutes(from 
15:01~15:07).
{noformat}
2018-06-06 15:00:59,873 INFO [IPC Server handler 163 on 8020] BlockStateChange: 
BLOCK* addToInvalidates: blk_1593304672_519567210 xx.xx.xx.xx:50010 
xx.xx.xx.xx:50010 xx.xx.xx.xx:50010
2018-06-06 15:00:59,875 INFO [IPC Server handler 163 on 8020] BlockStateChange: 
BLOCK* addToInvalidates: blk_1593304675_519567213 xx.xx.xx.xx:50010 
xx.xx.xx.xx:50010 xx.xx.xx.xx:50010
2018-06-06 15:00:59,879 INFO [IPC Server handler 163 on 8020] BlockStateChange: 
BLOCK* addToInvalidates: blk_1593304678_519567216 xx.xx.xx.xx:50010 
xx.xx.xx.xx:50010 xx.xx.xx.xx:50010
2018-06-06 15:00:59,882 INFO [IPC Server handler 163 on 8020] BlockStateChange: 
BLOCK* addToInvalidates: blk_1593304679_519567217 xx.xx.xx.xx:50010 
xx.xx.xx.xx:50010 xx.xx.xx.xx:50010
.....
2018-06-06 15:07:00,004 INFO [IPC Server handler 163 on 8020] BlockStateChange: 
BLOCK* addToInvalidates: blk_1595774272_522036817 xx.xx.xx.xx:50010 
xx.xx.xx.xx:50010 xx.xx.xx.xx:50010
2018-06-06 15:07:00,005 INFO [IPC Server handler 163 on 8020] BlockStateChange: 
BLOCK* addToInvalidates: blk_1595774270_522036815 xx.xx.xx.xx:50010 
xx.xx.xx.xx:50010 xx.xx.xx.xx:50010
2018-06-06 15:07:00,007 INFO [IPC Server handler 163 on 8020] BlockStateChange: 
BLOCK* addToInvalidates: blk_1595774256_522036801 xx.xx.xx.xx:50010 
xx.xx.xx.xx:50010 xx.xx.xx.xx:500
{noformat}

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> ----------------------------------------------------------------------
>
>                 Key: HDFS-13671
>                 URL: https://issues.apache.org/jira/browse/HDFS-13671
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.0.3
>            Reporter: Yiqun Lin
>            Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x00007fb505b27800 nid=0x94c3 runnable [0x00007fa861361000]
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>       at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>       at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>       at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to