[
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538930#comment-17538930
]
ZanderXu commented on HDFS-15562:
---------------------------------
[~shv] [~aihuaxu] are you still following up this issue? In our production,
one NameService contains multiple ObserverNameNode, and we stop one Observer
and plan to offline, but it caused standby abnormally doing checkpoint.
bq. We may add a logic for the Checkpointer to not re-create an image if it was
created recently
`lastCheckpointTime` already exist, but not update it when some exception
happened.
bq. we see transfers fail once in a while, so just ignoring image transfer
failures isn't right.
Standby can uploads the latest fsImage to all namenodes as much as possible.
For abnormal namenode, if Standby retries multiple times, it still fails,
Standby just ignore it will be ok.
[~shv] [~ferhui] [~hexiaoqiao] do you have some good ideas about it? And I will
be happy to work on it.
BTW, do we need a mechanism to actively trigger checkpoint?
> StandbyCheckpointer will do checkpoint repeatedly while connecting
> observer/active namenode failed
> --------------------------------------------------------------------------------------------------
>
> Key: HDFS-15562
> URL: https://issues.apache.org/jira/browse/HDFS-15562
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: SunHao
> Assignee: Aihua Xu
> Priority: Major
> Labels: pull-request-available
> Attachments: HDFS-15562.patch
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> We find the standby namenode will do checkpoint over and over while
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage
> to the other namenode failed, so that the standby namenode will keep doing
> checkpoint repeatedly.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]