n3nash commented on issue #2068: URL: https://github.com/apache/hudi/issues/2068#issuecomment-693535202
@bradleyhurley there isn't any out of the box formula for this since it depends on the latency you want to achieve. A good way to understand this is by keeping a high number of executors to start with and then turn on dynamic allocation -> https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation. This will allow your job to run at the maximum throughput and then you can tune it up or down based on what you see. For memory, easy way is to use some kind of JVM profiler to see what is the executor and driver memory usage, you could do this using -> https://github.com/uber-common/jvm-profiler or https://github.com/linkedin/dr-elephant ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
