[
https://issues.apache.org/jira/browse/HDFS-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090317#comment-14090317
]
Hadoop QA commented on HDFS-6772:
---------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12660508/HDFS-6772-3.patch
against trunk revision .
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-hdfs-project/hadoop-hdfs.
{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.
Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/7589//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7589//console
This message is automatically generated.
> Get DNs out of blockContentsStale==true state faster when NN restarts
> ---------------------------------------------------------------------
>
> Key: HDFS-6772
> URL: https://issues.apache.org/jira/browse/HDFS-6772
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ming Ma
> Assignee: Ming Ma
> Attachments: HDFS-6772-2.patch, HDFS-6772-3.patch, HDFS-6772.patch
>
>
> Here is the non-HA scenario.
> 1. Get HDFS into block-over-replicated situation.
> 2. Restart the NN.
> 3. From NN's point of view, DNs will remain in blockContentsStale==true state
> for a long time. That in turns make postponedMisreplicatedBlocks size big.
> Bigger postponedMisreplicatedBlocks size will impact blockreport latency.
> Given blockreport takes NN global lock, it has severe impact on NN
> performance and make the cluster unstable.
> Why will DNs remain in blockContentsStale==true state for a long time?
> 1. When a DN reconnect to NN upon NN restart, blockreport RPC could come in
> before heartbeat RPC. That is due to how BPServiceActor#offerService decides
> when to send blockreport and heartbeat. In the case of NN restart, NN will
> ask DN to register when NN gets the first heartbeat request; DN will then
> register with NN; followed by blockreport RPC; the heartbeat RPC will come
> after that.
> 2. So right after the first blockreport, given heartbeatedSinceFailover
> remains false, blockContentsStale will stay true.
> {noformat}
> DatanodeStorageInfo.java
> void receivedBlockReport() {
> if (heartbeatedSinceFailover) {
> blockContentsStale = false;
> }
> blockReportCount++;
> }
> {noformat}
> 3. So the DN will remain in blockContentsStale==true until the next
> blockreport. For big cluster, dfs.blockreport.intervalMsec could be set to
> some large value.
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)