[GitHub] [hudi] helpta commented on a diff in pull request #7255: [HUDI-5250] use the estimate record size when estimation threshold is l…

via GitHub Sun, 19 Feb 2023 20:44:48 -0800


helpta commented on code in PR #7255:
URL: https://github.com/apache/hudi/pull/7255#discussion_r1111450040



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java:
##########
@@ -372,7 +372,7 @@ protected static long averageBytesPerRecord(HoodieTimeline 
commitTimeline, Hoodi
     long avgSize = hoodieWriteConfig.getCopyOnWriteRecordSizeEstimate();
     long fileSizeThreshold = (long) 
(hoodieWriteConfig.getRecordSizeEstimationThreshold() * 
hoodieWriteConfig.getParquetSmallFileLimit());
     try {
-      if (!commitTimeline.empty()) {
+      if (hoodieWriteConfig.getRecordSizeEstimationThreshold() > 0 && 
!commitTimeline.empty()) {
         // Go over the reverse ordered commits to get a more recent estimate 
of average record size.
         Iterator<HoodieInstant> instants = 
commitTimeline.getReverseOrderedInstants().iterator();

Review Comment:
   >  if (hoodieWriteConfig.getRecordSizeEstimationThreshold() > 0 && 
!commitTimeline.empty()) 
   
   Shouldn't we first determine if the default value is adjusted 
(org.apache.hudi.config.HoodieCompactionConfig#_COPY_ON_WRITE_RECORD_SIZE_ESTIMATE)?
  I think this is the first priority. 
   
    Imagine, according to the logic adjusted above, that is, you can only set 
avgSize to a fixed 1024 (the default size) by adjusting the threshold, but not 
the ability to let users customize avgSize according to their personalized 
tasks. 
   
   If I have misunderstood, please let me know ,thanks
   
   @danny0405 @honeyaya @codope @nsivabalan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] helpta commented on a diff in pull request #7255: [HUDI-5250] use the estimate record size when estimation threshold is l…

Reply via email to