[GitHub] [hudi] danny0405 commented on a diff in pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …

via GitHub Mon, 26 Jun 2023 20:19:22 -0700


danny0405 commented on code in PR #9035:
URL: https://github.com/apache/hudi/pull/9035#discussion_r1243093277



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -612,6 +612,20 @@ public class HoodieWriteConfig extends HoodieConfig {
       .sinceVersion("0.10.0")
       .withDocumentation("File Id Prefix provider class, that implements 
`org.apache.hudi.fileid.FileIdPrefixProvider`");
 
+  public static final ConfigProperty<String> ENFORCE_COMPLETION_MARKER_CHECKS 
= ConfigProperty
+      .key("hoodie.markers.enforce.completion.checks")
+      .defaultValue("false")
+      .sinceVersion("0.10.0")
+      .withDocumentation("Prevents the creation of duplicate data files, when 
multiple spark tasks are racing to "
+          + "create data files and a completed data file is already present");
+
+  public static final ConfigProperty<String> ENFORCE_FINALIZE_WRITE_CHECK = 
ConfigProperty
+      .key("hoodie.markers.enforce.finalize.write.check")
+      .defaultValue("false")
+      .sinceVersion("0.10.0")
+      .withDocumentation("When WriteStatus obj is lost due to engine related 
failures, then recomputing would involve "
+          + "re-writing all the data files. When this check is enabled it 
would block the rewrite from happening.");

Review Comment:
   > if writeStatus RDD blocks are found to be missing, execution engine 
(spark) would re-trigger the write stage (to recreate the write statuses).
   
   It seems a Spark engine specific issue? But here we put the fix in the 
writer code which could affect all the engines. May I know why the writeStatus 
RDD blocks could be missing here, can we persist it before commiting to the MDT 
?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a diff in pull request #9035: [HUDI-6416] Completion markers for handling execution engine (spark) …

Reply via email to