[
https://issues.apache.org/jira/browse/HDFS-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032723#comment-15032723
]
Jing Zhao commented on HDFS-9381:
---------------------------------
Thanks for the discussion, Uma and Zhe!
bq. To determine whether the optimization justifies the added complexity, I
think we can create a more concrete example.
yeah, I agree maybe a more concrete example and some perf numbers will help us
understand the optimization better.
bq. I think besides reducing locking contention, this change also speeds up the
recovery of non-striping blocks. E.g., when a rack fails, there could be a lot
of striped block recovery work waiting. They could block regular recovery tasks.
When we have a lot of missing blocks/replicas (e.g., caused by DataNode
failures or even rack failure), since in each iteration the replication monitor
only handles limited number of blocks, some iterations may be wasted by
checking this type of striped blocks. However, it is also possible that because
of the longer processing time, there is higher chance for the striped blocks to
be updated in the UC queue before being processed by the replication monitor
for the first time. Also the striped blocks are more likely to be replicated
across multiple racks, a single rack failure may only cause a single internal
block missing for a striped block group. So feels like the scenarios are more
complicated here.
> When same block came for replication for Striped mode, we can move that block
> to PendingReplications
> ----------------------------------------------------------------------------------------------------
>
> Key: HDFS-9381
> URL: https://issues.apache.org/jira/browse/HDFS-9381
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: erasure-coding, namenode
> Affects Versions: 3.0.0
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Attachments: HDFS-9381-02.patch, HDFS-9381-03.patch,
> HDFS-9381.00.patch, HDFS-9381.01.patch
>
>
> Currently I noticed that we are just returning null if block already exists
> in pendingReplications in replication flow for striped blocks.
> {code}
> if (block.isStriped()) {
> if (pendingNum > 0) {
> // Wait the previous recovery to finish.
> return null;
> }
> {code}
> Here if we just return null and if neededReplications contains only fewer
> blocks(basically by default if less than numliveNodes*2), then same blocks
> can be picked again from neededReplications from next loop as we are not
> removing element from neededReplications. Since this replication process need
> to take fsnamesystmem lock and do, we may spend some time unnecessarily in
> every loop.
> So my suggestion/improvement is:
> Instead of just returning null, how about incrementing pendingReplications
> for this block and remove from neededReplications? and also another point to
> consider here is, to add into pendingReplications, generally we need target
> and it is nothing but to which node we issued replication command. Later when
> after replication success and DN reported it, block will be removed from
> pendingReplications from NN addBlock.
> So since this is newly picked block from neededReplications, we would not
> have selected target yet. So which target to be passed to pendingReplications
> if we add this block? One Option I am thinking is, how about just passing
> srcNode itself as target for this special condition? So, anyway if the block
> is really missed, srcNode will not report it. So this block will not be
> removed from pending replications, so that when it is timed out, it will be
> considered for replication again and that time it will find actual target to
> replicate while processing as part of regular replication flow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)