minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r769779554



##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/deltacommit/AbstractSparkDeltaCommitActionExecutor.java
##########
@@ -74,8 +74,8 @@ public Partitioner getUpsertPartitioner(WorkloadProfile 
profile) {
   public Iterator<List<WriteStatus>> handleUpdate(String partitionPath, String 
fileId,
       Iterator<HoodieRecord<T>> recordItr) throws IOException {
     LOG.info("Merging updates for commit " + instantTime + " for file " + 
fileId);
-
-    if (!table.getIndex().canIndexLogFiles() && 
mergeOnReadUpsertPartitioner.getSmallFileIds().contains(fileId)) {
+    if (!table.getIndex().canIndexLogFiles() && mergeOnReadUpsertPartitioner 
!= null

Review comment:
       mergeOnReadUpsertPartitioner cannot be initialized when using 
Bucketpartitioner.
   The `AbstractSparkDeltaCommitActionExcutor` holding the partitioner is to 
rewrite the small parquet when update. Getting small files from the partitioner 
is strange here. IMO, small files should be maintained by filesystemview. Shall 
i create another pr to resolve this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to