[GitHub] [hudi] n3nash commented on issue #2068: [SUPPORT]Deltastreamer Upsert Very Slow / Never Completes After Initial Data Load

GitBox Wed, 16 Sep 2020 09:59:21 -0700


n3nash commented on issue #2068:
URL: https://github.com/apache/hudi/issues/2068#issuecomment-693535202



   @bradleyhurley there isn't any out of the box formula for this since it 
depends on the latency you want to achieve. A good way to understand this is by 
keeping a high number of executors to start with and then turn on dynamic 
allocation -> 
https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation. 
This will allow your job to run at the maximum throughput and then you can tune 
it up or down based on what you see. 
   For memory, easy way is to use some kind of JVM profiler to see what is the 
executor and driver memory usage, you could do this using -> 
https://github.com/uber-common/jvm-profiler or 
https://github.com/linkedin/dr-elephant


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] n3nash commented on issue #2068: [SUPPORT]Deltastreamer Upsert Very Slow / Never Completes After Initial Data Load

Reply via email to