KnightChess commented on issue #10418: URL: https://github.com/apache/hudi/issues/10418#issuecomment-1877175659
@zhangjw123321 look like `hoodie.bulkinsert.shuffle.parallelism` can not work on non-partitioned table in the code. In the spark ui, may be you not set `spark.default.parallelism` so `reduceBykey` will use the parrent rdd partitions size. Can you try `set spark.default.parallelism=100;` I think it will reduce the parallelism in `stage 10` to 100. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
