[
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243633#comment-14243633
]
Jesse Yates commented on HDFS-6440:
-----------------------------------
bq. Does this mean that there might be multiple SNNs marking themselves as
'primary checkpointer' during the same time period, since it is determined by
SNN itself
Yes, that is a possibility, which I was getting at with my comment about the
primary checkpointer "ping-ponging". The images would have small deltas, but
the ANN would be kept up to date. As the updates slow down, one of the
checkpointers would eventually win. However, either (a) we haven't seen this
show up on any of our clusters or (b) have never noticed any service issues
because of it.
bq. Would it be reasonable to also let ANN to reject fsimage upload request?
Sure, its possible. My concern was around ensuring that the ANN had to most up
to date checkpoint and let the SNNs sort themselves out. It seems a bit more
intrusive in the code since you also need to differentiate the source - you
don't want to reject an update from the primary checkpointer if it occurs just
because of the time elapsed. I'd say worth looking into in a follow up jira
though - this is already a pretty large change.
> Support more than 2 NameNodes
> -----------------------------
>
> Key: HDFS-6440
> URL: https://issues.apache.org/jira/browse/HDFS-6440
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: auto-failover, ha, namenode
> Affects Versions: 2.4.0
> Reporter: Jesse Yates
> Assignee: Jesse Yates
> Attachments: Multiple-Standby-NameNodes_V1.pdf,
> hdfs-6440-cdh-4.5-full.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one
> active, one standby). This would be the last bit to support running multiple
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some
> complexity around managing the checkpointing, and updating a whole lot of
> tests.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)