xushiyan commented on issue #3821: URL: https://github.com/apache/hudi/issues/3821#issuecomment-946383159
@rohit-m-99 this is likely due to non-partitioned dataset https://github.com/apache/hudi/blob/dbcf60f370e93ab490cf82e677387a07ea743cda/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L254 getSmallFilesForPartitions() is parallelized over partitions. Try use 10-20 partitions may get this faster to <50s and make use of multiple executors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
