[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

Chen Zhang (JIRA) Wed, 31 Jul 2019 01:32:33 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896917#comment-16896917
 ]


Chen Zhang commented on HDFS-14657:
-----------------------------------

Thanks [~shv], but sorry I can't see any problem of this change on 2.6 version.
{quote}I believe when you release the lock while iterating over the storage 
blocks, the iterator may find itself in an isolated chain of the list after 
reacquiring the lock
{quote}
It won't happen, because processReport don't iterate the storage blocks at 2.6, 
the whole FBR procedure(for each storage) can be simplified like this:

 
| # Insert a delimiter into the head of block list(triplets, it's actually a 
double linked list, so I'll ref it as the block list for simplification) of 
this storage.
 # Start a loop, iterate through block report
 ## Get a block from the report
 ## Using the block to get the stored BlockInfo object from BlockMap
 ## Check the status of the block, and add the block to corresponding 
set(toAdd, toUc, toInvalidate, toCorrupt)
 ## Move the block to the head of block list（which makes the block placed 
before delimiter）
 # Start a loop to iterate through block list, find the blocks after delimiter, 
add them to toRemove set.|

My proposal in this Jira is to release and re-acquire NN lock between 2.3 and 
2.4. This solution won't affect the correctness of block report procedure for 
the following reasons:
 # All the reported block will stored before delimiter in the end.
 # If any other thread acquire the NN lock before 2.4 add adds some new blocks, 
they will be added in the head of list.
 # If any other thread acquire the NN lock before 2.4 and removes some blocks, 
it won't affect the loop at 2nd step. (Pls notice that the delimiter can't be 
remove by other threads)
 # All the blocks after delimiter should be removed

According to the reasons described above, the following problem you mentioned 
also won't happen:
{quote}you may remove replicas that were not supposed to be removed
{quote}
 

I agree with you that the  things are tricky here, but this change is quite 
simple and I think we still can make clear the impaction.

> Refine NameSystem lock usage during processing FBR
> --------------------------------------------------
>
>                 Key: HDFS-14657
>                 URL: https://issues.apache.org/jira/browse/HDFS-14657
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Chen Zhang
>            Assignee: Chen Zhang
>            Priority: Major
>         Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

Reply via email to