[
https://issues.apache.org/jira/browse/HDFS-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619422#comment-13619422
]
Aaron T. Myers commented on HDFS-4657:
--------------------------------------
The issue is that in {{BlockManager#processReport}} we use the following logic
to determine whether or not a block report is the initial block report from a
DN:
{code}
if (node.numBlocks() == 0) {
// The first block report can be processed a lot more efficiently than
// ordinary block reports. This shortens restart times.
processFirstBlockReport(node, newReport);
} else {
processReport(node, newReport);
}
{code}
However in the DN code which actually performs the block report
({{BPServiceActor#blockReport}}) we always send the incremental report before
doing the full report:
{code}
// Flush any block information that precedes the block report. Otherwise
// we have a chance that we will miss the delHint information
// or we will report an RBW replica after the BlockReport already reports
// a FINALIZED one.
reportReceivedDeletedBlocks();
// Create block report
long brCreateStartTime = now();
BlockListAsLongs bReport = dn.getFSDataset().getBlockReport(
bpos.getBlockPoolId());
// Send block report
long brSendStartTime = now();
StorageBlockReport[] report = { new StorageBlockReport(
new DatanodeStorage(bpRegistration.getStorageID()),
bReport.getBlockListAsLongs()) };
cmd = bpNamenode.blockReport(bpRegistration, bpos.getBlockPoolId(), report);
{code}
Most of the time when the NN is starting up the DN won't have any pending
received or deleted blocks since the NN will have been down and couldn't have
been allocating new blocks for clients or issuing deletes. However, in the HA
case, the active NN may have been up and running just fine allocating blocks
while the standby NN was down. In this case when the standby NN starts up it
will receive an incremental block report first and if any of these blocks are
not queued for later processing (i.e. the standby has received the edit logs
containing the block allocation) it will result in the first full BR being
identified as a non-initial BR, thus logging the addition of every block.
I can think of a few solutions to this issue:
# We could try to ensure that a DN never sends an incremental block report
after registering with an NN before a full block report is sent.
# We could change the logic at the NN for determining whether or not a full BR
is the initial full BR. Instead of checking whether or not any blocks have yet
been reported for the node, we could add a boolean per-DN that expressly
signifies whether or not a full BR has been processed by this NN for this DN
yet.
# We could make it so that full block reports, regardless of whether or not
they are the initial block report from a DN, do not log every block. On one
hand this may be overkill since the only time I've seen this cause a problem is
during SBN restart, but on the other hand it doesn't seem like a good idea to
me to ever log every block reported by a DN, whether it's the initial full BR
or a later full BR.
I think I'm leaning toward option #3.
Thoughts?
> If incremental BR is received before first full BR NN will log a line for
> every block on a DN
> ---------------------------------------------------------------------------------------------
>
> Key: HDFS-4657
> URL: https://issues.apache.org/jira/browse/HDFS-4657
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.0.4-alpha
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
>
> This can impact restart times pretty substantially if the DNs have a lot of
> blocks, and since the FSNS write lock is held while processing the block
> report clients will not make any progress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira