[ 
https://issues.apache.org/jira/browse/HDFS-16836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634144#comment-17634144
 ] 

Xiaoqiao He edited comment on HDFS-16836 at 11/15/22 4:10 AM:
--------------------------------------------------------------

{quote}I mean doCheckpoint can fail and throw exception hence needRollbackImage 
is never reset to false and can leak after RU is done.
{quote}
Thanks, IIUC, when trigger rollback `needRollbackCheckpoint` set to false only 
when finish a checkpoint for rollback image as code comment said, if meet 
exception and set back it will never save one rollback image, right? Not sure 
if your proposal is one truth way.


was (Author: hexiaoqiao):
{quote}I mean doCheckpoint can fail and throw exception hence needRollbackImage 
is never reset to false and can leak after RU is done.
{quote}
Thanks, IIUC, when trigger rollback `needRollbackCheckpoint` set to false only 
when finish a checkpoint for rollback image as code comment said, if meet 
exception and set back it will never save one rollback image, right? Not sure 
if this is one truth way.

> StandbyCheckpointer can still trigger rollback fs image after RU is finalized
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-16836
>                 URL: https://issues.apache.org/jira/browse/HDFS-16836
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Lei Yang
>            Priority: Major
>              Labels: pull-request-available
>
> StandbyCheckpointer trigger rollback fsimage when RU is started.
> When ru is started, a flag (needRollbackImage) was set to true during edit 
> log replay.
> And it only gets reset to false when doCheckpoint() succeeded.
> Think about following scenario:
>  # Start RU, needRollbackImage is set to true.
>  # doCheckpoint() failed.
>  # RU is finalized.
>  # namesystem.getFSImage().hasRollbackFSImage() is always false since 
> rollback image cannot be generated once RU is over.
>  # needRollbackImage was never set to false.
>  # Checkpoints threshold(1m txns) and period(1hr) are not honored.
> {code:java}
> StandbyCheckpointer:
> void doWork() {
>  ....
>   doCheckpoint();
>   // reset needRollbackCheckpoint to false only when we finish a ckpt
>   // for rollback image
>   if (needRollbackCheckpoint
>       && namesystem.getFSImage().hasRollbackFSImage()) {
>     namesystem.setCreatedRollbackImages(true);
>     namesystem.setNeedRollbackFsImage(false);
>   }
>   lastCheckpointTime = now;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to