rnatarajan commented on issue #2083:
URL: https://github.com/apache/hudi/issues/2083#issuecomment-692170318


   Update on what @rafaelhbarros  has mentioned.
   
   With Hudi 0.6.0, Identified a bottleneck in Sort and turned the feature off 
("hoodie.bulkinsert.sort.mode - NONE").
   Matching parallelism with number of cores*executors available give the 
optimal speed.
   If the cores*executors = 10 and if parallelism is 20, then 10 
cores*processors cannot perform real parallelism of 20 and the time taken to 
process the record becomes more.
   
   With Hudi MoR and Bulk Insert + without Sort, with the parameters that 
@rafaelhbarros has posted was able to achieve about 20K Rows Per second.
   
   With Hudi CoW and Insert Mode + Without Sort was able to achieve 15K Rows 
per second.
   
   We are aiming to achieve about 20K Rows per second with similar hardware( 
--driver-memory 4G    --executor-memory 5G   
    --executor-cores 4  --num-executors 6 ).
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to