[ 
https://issues.apache.org/jira/browse/HDFS-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-6425:
--------------------------
    Attachment: HDFS-6425-3.patch

Thanks, Kihwal.

Here is the updated patch for trunk based on a slightly different version. In 
rescanPostponedMisreplicatedBlocks, instead of always picking the first 
blocksPerRescan blocks, the new version randomly selects blocksPerRescan 
consecutive blocks. This is to handle the case if for some reason some 
datanodes remain in content stale state for a long time and only impact the 
first blocksPerRescan blocks.

This new version has been running on our production clusters for couple months.

Regarding the root cause of over replication. We did some analysis a while 
back. It could be due to the IBR scenario you mentioned. There are also other 
sources.

1. Load balancer could create spike of over replication in our clusters.
2. As part of machine repair process, we used to bring "unformatted" machines 
back the cluster.
3. It appears right after NN startup and leave safe mode but before all DNs 
send blockreport, NN will consider some blocks under replicated and start 
replication process. Later after the remaining DNs send blockreport, NN will 
get into over replicated situation.

> Large postponedMisreplicatedBlocks has impact on blockReport latency
> --------------------------------------------------------------------
>
>                 Key: HDFS-6425
>                 URL: https://issues.apache.org/jira/browse/HDFS-6425
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-6425-2.patch, HDFS-6425-3.patch, 
> HDFS-6425-Test-Case.pdf, HDFS-6425.patch
>
>
> Sometimes we have large number of over replicates when NN fails over. When 
> the new active NN took over, over replicated blocks will be put to 
> postponedMisreplicatedBlocks until all DNs for that block aren't stale 
> anymore.
> We have a case where NNs flip flop. Before postponedMisreplicatedBlocks 
> became empty, NN fail over again and again. So postponedMisreplicatedBlocks 
> just kept increasing until the cluster is stable. 
> In addition, large postponedMisreplicatedBlocks could make 
> rescanPostponedMisreplicatedBlocks slow. rescanPostponedMisreplicatedBlocks 
> takes write lock. So it could slow down the block report processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to