minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r769779554
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/deltacommit/AbstractSparkDeltaCommitActionExecutor.java
##########
@@ -74,8 +74,8 @@ public Partitioner getUpsertPartitioner(WorkloadProfile
profile) {
public Iterator<List<WriteStatus>> handleUpdate(String partitionPath, String
fileId,
Iterator<HoodieRecord<T>> recordItr) throws IOException {
LOG.info("Merging updates for commit " + instantTime + " for file " +
fileId);
-
- if (!table.getIndex().canIndexLogFiles() &&
mergeOnReadUpsertPartitioner.getSmallFileIds().contains(fileId)) {
+ if (!table.getIndex().canIndexLogFiles() && mergeOnReadUpsertPartitioner
!= null
Review comment:
mergeOnReadUpsertPartitioner cannot be initialized when using
Bucketpartitioner.
The `AbstractSparkDeltaCommitActionExcutor` holding the partitioner is to
rewrite the small parquet when update. Getting small files from the partitioner
is strange here. IMO, small files should be maintained by filesystemview. Shall
i create another pr to resolve this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]