Vsevolod3 commented on issue #8071:
URL: https://github.com/apache/hudi/issues/8071#issuecomment-1591123193

   Thank you, Danny. Yes, I got the best performance by using BUCKET index with 
non-nested, numeric, non-hive partitions. We were actually able to get it to 
perform in under 3 minutes for the stream_write task when we picked a 
partitioning field that resulted in fewer partitions (14 partitions) compared 
to our previous tests (> 90 partitions).
   
   It seems the number of partitions Hudi has to manage has a _very_ large 
impact on performance. Would you be able to share any documents or blog posts 
that explain partitioning for performance further? I read through most of the 
documents under Concepts on the Hudi website (e.g. 
https://hudi.apache.org/docs/next/indexing), but didn't find a lot dealing in 
depth with partitioning strategies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to