[ 
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079950#comment-14079950
 ] 

Hadoop QA commented on HDFS-6772:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658667/HDFS-6772.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

                  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
                  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7501//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7501//console

This message is automatically generated.

> Get DNs out of blockContentsStale==true state faster when NN restarts
> ---------------------------------------------------------------------
>
>                 Key: HDFS-6772
>                 URL: https://issues.apache.org/jira/browse/HDFS-6772
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-6772.patch
>
>
> Here is the non-HA scenario.
> 1. Get HDFS into block-over-replicated situation.
> 2. Restart the NN.
> 3. From NN's point of view, DNs will remain in blockContentsStale==true state 
> for a long time. That in turns make postponedMisreplicatedBlocks size big. 
> Bigger postponedMisreplicatedBlocks size will impact blockreport latency. 
> Given blockreport takes NN global lock, it has severe impact on NN 
> performance and make the cluster unstable.
> Why will DNs remain in blockContentsStale==true state for a long time?
> 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in 
> before heartbeat RPC. That is due to how BPServiceActor#offerService decides 
> when to send blockreport and heartbeat. In the case of NN restart, NN will 
> ask DN to register when NN gets the first heartbeat request; DN will then 
> register with NN; followed by blockreport RPC; the heartbeat RPC will come 
> after that.
> 2. So right after the first blockreport, given heartbeatedSinceFailover 
> remains false, blockContentsStale will stay true.
> {noformat}
> DatanodeStorageInfo.java
>   void receivedBlockReport() {
>     if (heartbeatedSinceFailover) {
>       blockContentsStale = false;
>     }
>     blockReportCount++;
>   }
> {noformat}
> 3. So the DN will remain in blockContentsStale==true until the next 
> blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to 
> some large value.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to