xushiyan commented on issue #3751:
URL: https://github.com/apache/hudi/issues/3751#issuecomment-1025300721


   > num-executors 19
   > executor-cores 1
   > executor-memory 6g
   
   @MikeBuh this setting means you probably can have 30-40 parallelism to set 
for the spark and shuffle partitions and hudi parallelisms, given each core 
works with 1.5-2 concurrency. suggest increase executor cores to 3-5 to 
increase throughput, and tune other settings accordingly. You want to also 
align spark parallelism, shuffle partitions and hudi parallelisms (a few of 
them) as well.
   
   > hoodie.datasource.write.row.writer.enable: true
   
   This is only for bulk insert as of now.
   
   >  data seems to be skewed and thus not easy to partition using a field and 
ensuring even distribution
   
   usually you'd use salting to handle skewed data to improve this. the 
performance won't go far without handling skewness properly.
   
   Hope these would help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to