Re: [I] hoodie.bulkinsert.shuffle.parallelism Not activated [hudi]

via GitHub Fri, 05 Jan 2024 01:00:17 -0800


zhangjw123321 commented on issue #10418:
URL: https://github.com/apache/hudi/issues/10418#issuecomment-1878335425


   
![image](https://github.com/apache/hudi/assets/154970920/2d9d84ce-7c13-4264-a380-4396ab767d98)
   
![image](https://github.com/apache/hudi/assets/154970920/dc297f40-a49a-4023-95a4-3a57a48c9ac1)
   set hoodie.spark.sql.insert.into.operation=bulk_insert;
   set hoodie.bulkinsert.shuffle.parallelism=100;
   set spark.default.parallelism=100;
   set spark.sql.shuffle.partitions=100;
   After these parameters are used, the hdfs hudi file is still 10000
   
   > @zhangjw123321 look like `hoodie.bulkinsert.shuffle.parallelism` can not 
work on non-partitioned table in the code. In the spark ui, may be you not set 
`spark.default.parallelism` so `reduceBykey` will use the parent rdd partitions 
size. Can you try `set spark.default.parallelism=100;` I think it will reduce 
the parallelism in `stage 10` to 100.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] hoodie.bulkinsert.shuffle.parallelism Not activated [hudi]

Reply via email to