garyli1019 commented on issue #800: Performance tuning
URL: https://github.com/apache/incubator-hudi/issues/800#issuecomment-514332156
 
 
   Sure, I can try that.
   The delta data was very dirty for sure(many incoming old data need to 
rewrite existing parquet files). The task duration seems to increase 
exponentially with the shuffle read size. 
   
   Also, this job is not releasing executors when the tasks were finished. e.g. 
I gave this job 100 executors. Two tasks are running for 20 hours and others 
finished in minutes. This job will keep 100 executors for 20 hours. Is that 
possible to improve this?
   
   ![Screen Shot 2019-07-23 at 11 22 37 
AM](https://user-images.githubusercontent.com/23007841/61737200-94f15e00-ad3c-11e9-8016-d4f7cd5f8ead.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to