[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896420#comment-16896420
 ] 

Konstantin Shvachko commented on HDFS-14657:
--------------------------------------------

With branch-2 including 2.6 this change seems even more tricky, because 
{{DatanodeStorageInfo}} there uses a linked list (aka "triplets") to track 
replicas belonging to the DataNode storage. This was changed by HDFS-9260 for 
3.0, but was reported to cause serious performance issues.
For the sake of this issue I believe when you release the lock while iterating 
over the storage blocks, the iterator may find itself in an isolated chain of 
the list after reacquiring the lock. Also I don't know what happens with the 
replicas that were not reported by DN and are supposed to be deleted from the 
NameNode. With "triplets" they are collected at the end of the list when all 
reported replicas are processed. But if you re-acquire the lock the integrity 
of the list may be broken. So you may remove replicas that were not supposed to 
be removed.
I am just saying that things are tricky here. I would be surprised if you could 
navigate around these obstacles. But it would be big win if you did.

> Refine NameSystem lock usage during processing FBR
> --------------------------------------------------
>
>                 Key: HDFS-14657
>                 URL: https://issues.apache.org/jira/browse/HDFS-14657
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Chen Zhang
>            Assignee: Chen Zhang
>            Priority: Major
>         Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to