[ 
https://issues.apache.org/jira/browse/HDFS-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619422#comment-13619422
 ] 

Aaron T. Myers commented on HDFS-4657:
--------------------------------------

The issue is that in {{BlockManager#processReport}} we use the following logic 
to determine whether or not a block report is the initial block report from a 
DN:

{code}
if (node.numBlocks() == 0) {
  // The first block report can be processed a lot more efficiently than
  // ordinary block reports.  This shortens restart times.
  processFirstBlockReport(node, newReport);
} else {
  processReport(node, newReport);
}
{code}

However in the DN code which actually performs the block report 
({{BPServiceActor#blockReport}}) we always send the incremental report before 
doing the full report:

{code}
// Flush any block information that precedes the block report. Otherwise
// we have a chance that we will miss the delHint information
// or we will report an RBW replica after the BlockReport already reports
// a FINALIZED one.
reportReceivedDeletedBlocks();

// Create block report
long brCreateStartTime = now();
BlockListAsLongs bReport = dn.getFSDataset().getBlockReport(
    bpos.getBlockPoolId());

// Send block report
long brSendStartTime = now();
StorageBlockReport[] report = { new StorageBlockReport(
    new DatanodeStorage(bpRegistration.getStorageID()),
    bReport.getBlockListAsLongs()) };
cmd = bpNamenode.blockReport(bpRegistration, bpos.getBlockPoolId(), report);
{code}

Most of the time when the NN is starting up the DN won't have any pending 
received or deleted blocks since the NN will have been down and couldn't have 
been allocating new blocks for clients or issuing deletes. However, in the HA 
case, the active NN may have been up and running just fine allocating blocks 
while the standby NN was down. In this case when the standby NN starts up it 
will receive an incremental block report first and if any of these blocks are 
not queued for later processing (i.e. the standby has received the edit logs 
containing the block allocation) it will result in the first full BR being 
identified as a non-initial BR, thus logging the addition of every block.

I can think of a few solutions to this issue:

# We could try to ensure that a DN never sends an incremental block report 
after registering with an NN before a full block report is sent.
# We could change the logic at the NN for determining whether or not a full BR 
is the initial full BR. Instead of checking whether or not any blocks have yet 
been reported for the node, we could add a boolean per-DN that expressly 
signifies whether or not a full BR has been processed by this NN for this DN 
yet.
# We could make it so that full block reports, regardless of whether or not 
they are the initial block report from a DN, do not log every block. On one 
hand this may be overkill since the only time I've seen this cause a problem is 
during SBN restart, but on the other hand it doesn't seem like a good idea to 
me to ever log every block reported by a DN, whether it's the initial full BR 
or a later full BR.

I think I'm leaning toward option #3.

Thoughts?
                
> If incremental BR is received before first full BR NN will log a line for 
> every block on a DN
> ---------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4657
>                 URL: https://issues.apache.org/jira/browse/HDFS-4657
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.4-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>
> This can impact restart times pretty substantially if the DNs have a lot of 
> blocks, and since the FSNS write lock is held while processing the block 
> report clients will not make any progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to