[
https://issues.apache.org/jira/browse/HDFS-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-1984:
------------------------------
Attachment: hdfs-1984.txt
Here's a patch that does the above, and also adds two new test cases:
1) simulates a corrupt byte while transferring the image, making sure it
correctly detects it and rejects the upload
2) runs two 2NNs interleaved using Mockito to be sure that they don't interfere
with each other
I also ran the test from the command line as described above. I was able to run
two 2NNs both checkpointing as fast as they could. There was one minor
unrelated race condition that I'll address as a followup.
> HDFS-1073: Enable multiple checkpointers to run simultaneously
> --------------------------------------------------------------
>
> Key: HDFS-1984
> URL: https://issues.apache.org/jira/browse/HDFS-1984
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Affects Versions: Edit log branch (HDFS-1073)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-1984.txt
>
>
> One of the motivations of HDFS-1073 is that it decouples the checkpoint
> process so that multiple checkpoints could be taken at the same time and not
> interfere with each other.
> Currently on the 1073 branch this doesn't quite work right, since we have
> some state and validation in FSImage that's tied to a single fsimage_N --
> thus if two 2NNs perform a checkpoint at different transaction IDs, only one
> will succeed.
> As a stress test, we can run two 2NNs each configured with the
> fs.checkpoint.interval set to "0" which causes them to continuously
> checkpoint as fast as they can.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira