[
https://issues.apache.org/jira/browse/HDFS-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-2691:
------------------------------
Attachment: hdfs-2691.txt
Attached patch is what I've been testing with on a cluster with HBase for a
little while.
The approach is to send RBW replicas as part of the "block received and
deleted" reports. There are a couple potential optimizations we could do here:
1) only do this when HA is enabled?
2) change the client so that when it hflushed, it sends a flag to the DN which
causes it to report a RBW replica (so this only happens for blocks getting
hsynced/hflushed)
3) only send these reports when a failover is detected (as discussed above)
Would really appreciate feedback on the correct design here.
I also plan to continue testing this - there's still some weirdness where RBW
replicas show up as "corrupt" for a short while after a failover, but then seem
to fix themselves with no further effort - maybe just a metrics thing.
> HA: Tests and fixes for pipeline targets and replica recovery
> -------------------------------------------------------------
>
> Key: HDFS-2691
> URL: https://issues.apache.org/jira/browse/HDFS-2691
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Attachments: hdfs-2691.txt, hdfs-2691.txt
>
>
> Currently there are some TODOs around pipeline/recovery code in the HA
> branch. For example, commitBlockSynchronization only gets sent to the active
> NN which may have failed over by that point. So, we need to write some tests
> here and figure out what the correct behavior is.
> Another related area is the treatment of targets in the pipeline. When a
> pipeline is created, the active NN adds the "expected locations" to the
> BlockInfoUnderConstruction, but the DN identifiers aren't logged with the
> OP_ADD. So after a failover, the BlockInfoUnderConstruction will have no
> targets and I imagine replica recovery would probably trigger some issues.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira