voonhous commented on PR #7891:
URL: https://github.com/apache/hudi/pull/7891#issuecomment-1457829506

   Thank you @zhangyue19921010 for assisting. Figured the rational behind the 
fix.
   
   This fix is to prevent a the fileGroup that may have been created by a 
failed clustering from being included in the second replaceCommit.
   
   #Example:
   
   ```
   c0 {write FG_0, FG_1}
   r1 {cluster FG_0, FG_1}
   c2 {write FG_2, FG_3} 
   c3 {write FG_3, FG_4} 
   c4 {write FG_5, FG_6} 
   ```
   
   After archive:
   ```
   r1 {cluster FG_0, FG_1}
   c2 {write FG_2, FG_3} 
   c3 {write FG_3, FG_4} 
   c4 {write FG_5, FG_6} 
   ```
   
   r1 fails after writing some files:
   ```
   r1 {cluster FG_0, FG_1 -> FG_01_r1}
   c2 {write FG_2, FG_3} 
   c3 {write FG_3, FG_4} 
   c4 {write FG_5, FG_6} 
   ```
   
   r5 is created:
   
   ```
   r1 {cluster FG_0, FG_1 -> FG_01_r1}
   c2 {write FG_2, FG_3} 
   c3 {write FG_3, FG_4} 
   c4 {write FG_5, FG_6} 
   r5 {FG_2..FG_6, FG_01_r1}
   ```
   
   `FG_01_r1` should not be included in into r5. However, `FG_01_r1` instant 
time is r1, which is before c2, it will be deemed as a committed file slice. 
   
   On top of that, i realised that the the function `FSUtils.getCommitTime` is 
returning the writeToken instead of the instant time, I will raise another PR 
to fix the test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to