xushiyan commented on code in PR #8390:
URL: https://github.com/apache/hudi/pull/8390#discussion_r1182590639


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -656,6 +657,19 @@ public class HoodieWriteConfig extends HoodieConfig {
       .withDocumentation("Whether to enable commit conflict checking or not 
during early "
           + "conflict detection.");
 
+  public static final ConfigProperty<Boolean> SAMPLE_WRITES_ENABLED = 
ConfigProperty
+      .key("hoodie.write.sample.writes.enabled")
+      .defaultValue(false)
+      .withDocumentation("Set this to true to sample from the first batch of 
records and write to the auxiliary path, before writing to the table."
+          + "The sampled records are used to calculate the average record 
size. The relevant write client will have `" + 
COPY_ON_WRITE_RECORD_SIZE_ESTIMATE.key()
+          + "` being overwritten by the calculated result.");
+
+  public static final ConfigProperty<Integer> SAMPLE_WRITES_SIZE = 
ConfigProperty
+      .key("hoodie.write.sample.writes.size")
+      .defaultValue(2000)

Review Comment:
   sounds fair to increase it given this is a 1-time write. to balance a bit, 
we can go with 5k, as for big payload size, more sampling won't make much 
difference



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to