rnatarajan commented on issue #2083:
URL: https://github.com/apache/hudi/issues/2083#issuecomment-692170318
Update on what @rafaelhbarros has mentioned.
With Hudi 0.6.0, Identified a bottleneck in Sort and turned the feature off
("hoodie.bulkinsert.sort.mode - NONE").
Matching parallelism with number of cores*executors available give the
optimal speed.
If the cores*executors = 10 and if parallelism is 20, then 10
cores*processors cannot perform real parallelism of 20 and the time taken to
process the record becomes more.
With Hudi MoR and Bulk Insert + without Sort, with the parameters that
@rafaelhbarros has posted was able to achieve about 20K Rows Per second.
With Hudi CoW and Insert Mode + Without Sort was able to achieve 15K Rows
per second.
We are aiming to achieve about 20K Rows per second with similar hardware(
--driver-memory 4G --executor-memory 5G
--executor-cores 4 --num-executors 6 ).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]