prashantwason commented on code in PR #8758:
URL: https://github.com/apache/hudi/pull/8758#discussion_r1231390661


##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieStorageConfig.java:
##########
@@ -78,6 +78,11 @@ public class HoodieStorageConfig extends HoodieConfig {
       .markAdvanced()
       .withDocumentation("Target file size in bytes for HFile base files.");
 
+  public static final ConfigProperty<Boolean> HFILE_WRITER_TO_ALLOW_DUPLICATES 
= ConfigProperty
+      .key("hoodie.hfile.writer.allow.duplicates")
+      .defaultValue(false)
+      .withDocumentation("Allows duplicates to be written into HFile.");

Review Comment:
   In theory there should not be any case where duplicates exist in HUDI 
dataset as that will not work with RI / deletes / etc. But this config allows 
initializing RI on datasets which have duplicates and fixing the duplicates 
would be difficult/expensive. HUDI does not have a deletePrepped to fix 
duplicates ( I am working on that). 
   
   So dont know what to add more here. As this is an advanced setting and no 
one should be switching this on except for the one case I mentioned above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to