[ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634619#comment-17634619
 ] 

Xiaoqiao He commented on HDFS-16836:
------------------------------------

{quote}What we have seen is #1 succeeded but #2 failed so 
needRollbackCheckpoint is never set back to false and all the subsequent 
checkpointings are just continuously triggering rollback fsimage for RU even 
after RU is finalized. This bypasses the checkpoint period and threshold 
check.{quote}
Thanks for the detailed explains. It makes sense to me.

> StandbyCheckpointer can still trigger rollback fs image after RU is finalized
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-16836
>                 URL: https://issues.apache.org/jira/browse/HDFS-16836
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Lei Yang
>            Assignee: Lei Yang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.5
>
>
> StandbyCheckpointer trigger rollback fsimage when RU is started.
> When ru is started, a flag (needRollbackImage) was set to true during edit 
> log replay.
> And it only gets reset to false when doCheckpoint() succeeded.
> Think about following scenario:
>  # Start RU, needRollbackImage is set to true.
>  # doCheckpoint() failed.
>  # RU is finalized.
>  # namesystem.getFSImage().hasRollbackFSImage() is always false since 
> rollback image cannot be generated once RU is over.
>  # needRollbackImage was never set to false.
>  # Checkpoints threshold(1m txns) and period(1hr) are not honored.
> {code:java}
> StandbyCheckpointer:
> void doWork() {
>  ....
>   doCheckpoint();
>   // reset needRollbackCheckpoint to false only when we finish a ckpt
>   // for rollback image
>   if (needRollbackCheckpoint
>       && namesystem.getFSImage().hasRollbackFSImage()) {
>     namesystem.setCreatedRollbackImages(true);
>     namesystem.setNeedRollbackFsImage(false);
>   }
>   lastCheckpointTime = now;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to