helpta commented on code in PR #7255:
URL: https://github.com/apache/hudi/pull/7255#discussion_r1111450040
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java:
##########
@@ -372,7 +372,7 @@ protected static long averageBytesPerRecord(HoodieTimeline
commitTimeline, Hoodi
long avgSize = hoodieWriteConfig.getCopyOnWriteRecordSizeEstimate();
long fileSizeThreshold = (long)
(hoodieWriteConfig.getRecordSizeEstimationThreshold() *
hoodieWriteConfig.getParquetSmallFileLimit());
try {
- if (!commitTimeline.empty()) {
+ if (hoodieWriteConfig.getRecordSizeEstimationThreshold() > 0 &&
!commitTimeline.empty()) {
// Go over the reverse ordered commits to get a more recent estimate
of average record size.
Iterator<HoodieInstant> instants =
commitTimeline.getReverseOrderedInstants().iterator();
Review Comment:
> if (hoodieWriteConfig.getRecordSizeEstimationThreshold() > 0 &&
!commitTimeline.empty())
Shouldn't we first determine if the default value is adjusted
(org.apache.hudi.config.HoodieCompactionConfig#_COPY_ON_WRITE_RECORD_SIZE_ESTIMATE)?
I think this is the first priority.
Imagine, according to the logic adjusted above, that is, you can only set
avgSize to a fixed 1024 (the default size) by adjusting the threshold, but not
the ability to let users customize avgSize according to their personalized
tasks.
If I have misunderstood, please let me know ,thanks
@danny0405 @honeyaya @codope @nsivabalan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]