ad1happy2go commented on issue #8532:
URL: https://github.com/apache/hudi/issues/8532#issuecomment-1573386158

   Please find the response for your queries - 
   
   **how is hoodie.parquet.max.file.size and shuffle.parallelism related ?**
   If the operation is resulting updating too many file groups, then we should 
give a higher number of shuffle.parallelism  so that it can parallelise writing.
   **what causes shuffling of data**
   While doing bulk insert, it do the shuffle and sort in order to give good 
read performance. You can disable these flags to avoid shuffle. That should 
speed up the process. Configs - write.bulk_insert.sort_input and 
write.bulk_insert.shuffle_input
   hoodie.bulkinsert.shuffle.parallelism - This property should be directly 
depends on number of file groups it is updating.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to