[GitHub] [hudi] xicm commented on a diff in pull request #7226: [HUDI-5018] Make user-provided copyOnWriteRecordSizeEstimate first precedence

GitBox Mon, 21 Nov 2022 22:41:30 -0800


xicm commented on code in PR #7226:
URL: https://github.com/apache/hudi/pull/7226#discussion_r1028925825



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java:
##########
@@ -367,9 +368,19 @@ public int getPartition(Object key) {
   /**
    * Obtains the average record size based on records written during previous 
commits. Used for estimating how many
    * records pack into one file.
+   * Respect user setting by following the precedence as below
+   * 1) if user sets a value, then use it as is
+   * 2) if user not setting it, infer from timeline commit metadata
+   * 3) if timeline is empty, use a default (current: 1024)
    */
   protected static long averageBytesPerRecord(HoodieTimeline commitTimeline, 
HoodieWriteConfig hoodieWriteConfig) {
+    long defaultAvgSize = 
Integer.parseInt(HoodieCompactionConfig.COPY_ON_WRITE_RECORD_SIZE_ESTIMATE.defaultValue());
     long avgSize = hoodieWriteConfig.getCopyOnWriteRecordSizeEstimate();
+
+    if (avgSize != defaultAvgSize) {

Review Comment:
   Hi @danny0405 , do you mean to add some comment to explain why we ignore the 
write stats from the commit metadata and how to set the value more accurate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xicm commented on a diff in pull request #7226: [HUDI-5018] Make user-provided copyOnWriteRecordSizeEstimate first precedence

Reply via email to