[GitHub] [hudi] danny0405 commented on a diff in pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

GitBox Mon, 05 Dec 2022 22:21:35 -0800


danny0405 commented on code in PR #7362:
URL: https://github.com/apache/hudi/pull/7362#discussion_r1040532787



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -179,6 +179,20 @@ public class HoodieCompactionConfig extends HoodieConfig {
           + "record size estimate compute dynamically based on commit 
metadata. "
           + " This is critical in computing the insert parallelism and 
bin-packing inserts into small files.");
 
+  public static final ConfigProperty<String> 
COPY_ON_WRITE_RECORD_DYNAMIC_SAMPLE_MAXNUM = ConfigProperty
+          .key("hoodie.copyonwrite.record.dynamic.sample.maxnum")
+          .defaultValue(String.valueOf(100))
+          .withDocumentation("Although dynamic sampling is adopted, if the 
record size assumed by the user is unreasonable during the first write 
execution, "
+                  + "files that are too large or too small will be generated. 
Therefore, sampling is conducted from the data set during the first write 
operation. "
+                  + "In order to ensure performance, this parameter controls 
the absolute value of sampling.");

Review Comment:
   Can we avoid these options ? For most of the cases, the record size 
per-record should be close in size, we can take a default value that relatively 
reasonable BTW.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a diff in pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

Reply via email to