[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893844#comment-16893844 ]
Chen Zhang edited comment on HDFS-14657 at 7/26/19 2:06 PM: ------------------------------------------------------------ Thanks [~shv] for your comments, you are right, releasing NN lock in the middle of the loop will cause ConcurrentModificationException. This patch is ported from our internal 2.6 branch, the implementation changed a lot on the trunk branch and I didn't check all the detail. I just want to propose this demo solution and hear people's feedback. If the community thinks this solution is feasible, I'll try to work out a complete patch on the trunk branch next week, also will test it on our cluster and post some number of performance enhancement. was (Author: zhangchen): Hi [~shv], you are right, releasing NN lock in the middle of the loop will cause ConcurrentModificationException. This patch is ported from our internal 2.6 branch, the implementation changed a lot on the trunk branch and I didn't check all the detail. I just want to propose this demo solution and hear people's feedback. If the community thinks this solution is feasible, I'll try to work out a complete patch on the trunk branch next week, also will test it on our cluster and post some number of performance enhancement. > Refine NameSystem lock usage during processing FBR > -------------------------------------------------- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Chen Zhang > Assignee: Chen Zhang > Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org