[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845046#comment-13845046
 ] 

Vinay commented on HDFS-5496:
-----------------------------

bq. For (4), looks like currently we only retrieve metrics information from 
postponedMisreplicatedBlocks and we always check if the corresponding DNs are 
still stale before we make INVALIDATE decision. Thus it should be safe if we 
delay its initialization. 
For this I am trying make some changes in the patch. Hope next patch will 
include this.
bq. For (2), currently we add under-replicated blocks into neededReplications 
when 1) initially populating the replication queue, 2) checking replication 
when finalizing an under-construction file, 3) checking replication progress 
for decommissioning DN, and 4) pending replicas timeout. Delaying 1) and making 
it happen in parallel with 2)~4) should also be safe.
I guess this already in place. i.e. UnderReplicated Blocks are not added to 
neededReplications in {{processMisReplicatedBlock(..)}}.
{code}    if (!block.isComplete()) {
      // Incomplete blocks are never considered mis-replicated --
      // they'll be reached when they are completed or recovered.
      return MisReplicationResult.UNDER_CONSTRUCTION;
    }{code}
bq. For the current patch, I understand we need a new iterator that can iterate 
the blocksMap and not throw exception when concurrent modifications happen. 
However, I guess we may only need to define a new iterator and do not need to 
define the new BlocksMapGSet here. Also, since the new iterator shares most of 
the code with the existing LightWeightGSet#SetIterator, maybe we can simply 
extend SetIterator here?
Yes. Sure. 
bq. So for case 3, in non-HA setup, I think maybe we do not need to restart the 
processing since there should not be any pending editlog for NN to process in 
startActiveService? In HA setup, since we can always run 
processMisReplicateBlocks in startActiveService, we actually do not need to 
populate replication queue while still in safemode? If we're able to make these 
two changes, for the current patch, we do not need to worry about some 
already-running replication initializing thread.
This can be done. " do not need to worry about  already-running replication 
initializing " means just return the call if already initialization is in 
progress?


> Make replication queue initialization asynchronous
> --------------------------------------------------
>
>                 Key: HDFS-5496
>                 URL: https://issues.apache.org/jira/browse/HDFS-5496
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Kihwal Lee
>         Attachments: HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to