cshuo commented on code in PR #13464:
URL: https://github.com/apache/hudi/pull/13464#discussion_r2157928102


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java:
##########
@@ -480,6 +480,13 @@ public static boolean 
isLazyFailedWritesCleanPolicy(Configuration conf) {
     return 
HoodieCleanConfig.FAILED_WRITES_CLEANER_POLICY.defaultValue().equalsIgnoreCase(HoodieFailedWritesCleaningPolicy.LAZY.name());
   }
 
+  /**
+   * Returns whether the writers should use blocking instant time generation.
+   */
+  public static boolean isBlockingInstantGeneration(Configuration conf) {
+    return isCowTable(conf) && isUpsertOperation(conf);

Review Comment:
   Writer pipeline for cow with upsert: rowdataToHoodie -> bucket_assign -> 
writer
   * bucket assign function use state to assign record location.
   * writer use merge handle to upsert/merge records into the assigned file 
group.
   
   If eager flush / flush triggered by checkpoint happens before previous 
instant committed successfully, there are two potential problems:
   * if the file group is a new one, exception happens: "FileID xxx of 
partition path xxx does not exist."
   * if the file group has base file with smaller instant, data loss may happen 
because the flushed records will merge with base file with wrong version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to