[GitHub] [hudi] xushiyan commented on issue #3821: [SUPPORT] Ingestion taking very long time getting small files from partitions/

GitBox Mon, 18 Oct 2021 22:39:34 -0700


xushiyan commented on issue #3821:
URL: https://github.com/apache/hudi/issues/3821#issuecomment-946383159



   @rohit-m-99 this is likely due to non-partitioned dataset
   
   
https://github.com/apache/hudi/blob/dbcf60f370e93ab490cf82e677387a07ea743cda/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java#L254
   
   getSmallFilesForPartitions() is parallelized over partitions. Try use 10-20 
partitions may get this faster to <50s and make use of multiple executors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] xushiyan commented on issue #3821: [SUPPORT] Ingestion taking very long time getting small files from partitions/

Reply via email to