xushiyan commented on issue #3821:
URL: https://github.com/apache/hudi/issues/3821#issuecomment-946383159


   @rohit-m-99 this is likely due to non-partitioned dataset
   
   
https://github.com/apache/hudi/blob/dbcf60f370e93ab490cf82e677387a07ea743cda/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L254
   
   getSmallFilesForPartitions() is parallelized over partitions. Try use 10-20 
partitions may get this faster to <50s and make use of multiple executors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to