[ 
https://issues.apache.org/jira/browse/HDFS-15562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538930#comment-17538930
 ] 

ZanderXu commented on HDFS-15562:
---------------------------------

[~shv] [~aihuaxu]  are you still following up this issue? In our production, 
one NameService contains multiple ObserverNameNode, and we stop one Observer 
and plan to offline, but it caused standby abnormally doing checkpoint.

bq. We may add a logic for the Checkpointer to not re-create an image if it was 
created recently
`lastCheckpointTime` already exist, but not update it when some exception 
happened.

bq. we see transfers fail once in a while, so just ignoring image transfer 
failures isn't right.
Standby can uploads the latest fsImage to all namenodes as much as possible. 
For abnormal namenode, if Standby retries multiple times, it still fails, 
Standby just ignore it will be ok.

[~shv] [~ferhui] [~hexiaoqiao] do you have some good ideas about it? And I will 
be happy to work on it.

BTW, do we need a mechanism to actively trigger checkpoint? 

 

 

> StandbyCheckpointer will do checkpoint repeatedly while connecting 
> observer/active namenode failed
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15562
>                 URL: https://issues.apache.org/jira/browse/HDFS-15562
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: SunHao
>            Assignee: Aihua Xu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-15562.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We find the standby namenode will do checkpoint over and over while 
> connecting observer/active namenode failed.
> StandbyCheckpointer won't update “lastCheckpointTime” when upload new fsimage 
> to the other namenode failed, so that the standby namenode will keep doing 
> checkpoint repeatedly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to