[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888568#comment-16888568
 ] 

He Xiaoqiao edited comment on HDFS-14657 at 7/19/19 6:27 AM:
-------------------------------------------------------------

Thanks [~zhangchen] for filing this JIAR, it is very interesting improvement.
{quote}
1.Add a report lock to DatanodeDescriptor
2.Before processing the FBR and IBR, BlockManager should get the report lock 
for that node first
3.IBR must wait until FBR process complete, even the writelock may release and 
re-acquire many times during processing FBR
{quote}
Would you like to offer some more information about this improvement, it is 
very helpful for reviewers in my opinion.
IIUC, it changes only #blockReport processing in NameNode (rather than with 
DataNode) for only single DataNode, and hold lock per datanode and ensure no 
meta changes during process #blockReport. So I think it is under control about 
inconsistency.
[~shv], Would you like to share some furthermore suggestions?
+ cc [~linyiqun],[~xkrogen]


was (Author: hexiaoqiao):
Thanks [~zhangchen] for filing this JIAR, it is very interesting improvement.
{quote}
1.Add a report lock to DatanodeDescriptor
2.Before processing the FBR and IBR, BlockManager should get the report lock 
for that node first
3.IBR must wait until FBR process complete, even the writelock may release and 
re-acquire many times during processing FBR
{quote}
Would you like to offer some more information about this improvement, it is 
very helpful for reviewers in my opinion.
IIUC, it changes only #blockReport processing in NameNode (rather than with 
DataNode) for only single DataNode, and hold lock per datanode and ensure no 
meta changes during process #blockReport. So I think it is under control about 
inconsistency.
[~shv], Would you like to share some furthermore suggestions?

> Refine NameSystem lock usage during processing FBR
> --------------------------------------------------
>
>                 Key: HDFS-14657
>                 URL: https://issues.apache.org/jira/browse/HDFS-14657
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Chen Zhang
>            Assignee: Chen Zhang
>            Priority: Major
>         Attachments: HDFS-14657-001.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to