[
https://issues.apache.org/jira/browse/HDFS-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-2929:
------------------------------
Attachment: hdfs-2929.txt
Actually on second thought, given the title of this JIRA is "test and *fixes*
for block synchronization" I'll include the bug fix here.
The bug is the following:
After a block synchronization occurs, the primary datanode (who acted as
coordinator) calls {{commitBlockSynchronization}} on the active NN. This causes
the NN to update the generation stamp on all of the replicas to the new
generation stamp and persist the new genstamp to the edit log. But, the standby
at this time doesn't get the new block locations -- it only gets the new
genstamp through the edit log. This means that, after a synchronization, if we
fail over again, it won't have any correct block locations.
The fix is simple: when a DN updates the genstamp on a block
({{updateReplicaUnderRecovery}}) it adds the block, with its new genstamp, to
the next incremental block report. This causes it to get reported to both the
active and the standby node, so they both have correct information.
I think this will also improve behavior in the non-HA case as well -- in case
the commitBlockSynchronization fails, we'll have more up-to-date information at
the NN allowing reads of the in-progress block to continue during replica
recovery.
> HA: stress test and fixes for block synchronization
> ---------------------------------------------------
>
> Key: HDFS-2929
> URL: https://issues.apache.org/jira/browse/HDFS-2929
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Attachments: hdfs-2929.txt
>
>
> We have a couple of TODOs in {{commitBlockSynchronization}} and {{syncBlock}}
> around HA. I think the current behavior may in fact be correct, but I plan to
> write a stress test / functional test for better coverage of this area.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira