garyli1019 commented on a change in pull request #1602:
URL: https://github.com/apache/hudi/pull/1602#discussion_r435647198
##########
File path:
hudi-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -301,7 +301,7 @@ protected static long averageBytesPerRecord(HoodieTimeline
commitTimeline, int d
.fromBytes(commitTimeline.getInstantDetails(instant).get(),
HoodieCommitMetadata.class);
long totalBytesWritten = commitMetadata.fetchTotalBytesWritten();
long totalRecordsWritten = commitMetadata.fetchTotalRecordsWritten();
- if (totalBytesWritten > 0 && totalRecordsWritten > 0) {
+ if (totalBytesWritten > hoodieWriteConfig.getParquetSmallFileLimit()
&& totalRecordsWritten > 0) {
Review comment:
This bug would happen when a small commit made in a new partition. If we
make a small commit to an existing partition, it will very likely be merged to
the existing file so `totalBytesWritten` is should be a normal size. If there
is at least one file in the timeline larger than a small file then it should be
not using the default value.
I feel like adding a new config here would increase the complexity and could
be difficult for the user to understand. Should we do that for this edge case?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]