wkhappy1 opened a new issue, #11064: URL: https://github.com/apache/hudi/issues/11064
when Doing partition and writing data: tenant i find write data skew <img width="945" alt="1" src="https://github.com/apache/hudi/assets/54095696/1eb88afe-0608-41f2-92bd-de1f18974694"> this step cost 9.8 min <img width="960" alt="2" src="https://github.com/apache/hudi/assets/54095696/1deeef84-0f68-4d5d-88b8-0b26e83592da"> task with index 2 cost 9.8min <img width="959" alt="3" src="https://github.com/apache/hudi/assets/54095696/527ca82e-baa4-4cbc-a658-5994c93ca4da"> and this task write parquet file size 792329527 bigger than other file is there parameter to tuning。 hudi config hoodie.insert.shuffle.parallelism 200 hoodie.upsert.shuffle.parallelism 200 INDEX_TYPE BLOOM hoodie.parquet.compression.ratio 0.1 hoodie.parquet.max.file.size 125829120 **Environment Description** * Hudi version :0.11.1 * Spark version :3.2.2 * Hive version :3.1.3 * Hadoop version :3.3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
