wkhappy1 opened a new issue, #11064:
URL: https://github.com/apache/hudi/issues/11064

   
   when Doing partition and writing data: tenant
   i find write data skew
   <img width="945" alt="1" 
src="https://github.com/apache/hudi/assets/54095696/1eb88afe-0608-41f2-92bd-de1f18974694";>
   
   this step cost 9.8 min
   
   <img width="960" alt="2" 
src="https://github.com/apache/hudi/assets/54095696/1deeef84-0f68-4d5d-88b8-0b26e83592da";>
   
   task with index 2 cost 9.8min
   
   <img width="959" alt="3" 
src="https://github.com/apache/hudi/assets/54095696/527ca82e-baa4-4cbc-a658-5994c93ca4da";>
   
   and this task write parquet file size  792329527 bigger than other file
   
   
   is there parameter to tuning。
   
    hudi config
   hoodie.insert.shuffle.parallelism 200
   hoodie.upsert.shuffle.parallelism 200
   INDEX_TYPE BLOOM
   hoodie.parquet.compression.ratio 0.1
   hoodie.parquet.max.file.size 125829120
   
   **Environment Description**
   
   * Hudi version :0.11.1
   
   * Spark version :3.2.2
   
   * Hive version :3.1.3
   
   * Hadoop version :3.3.2
   
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to