Hi All,
I am here to get some expert advice on a use case I am working on.

Cluster & job details below -

Data - 6 Tb
Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps)

Parameters-
--executor-memory 10G \
--executor-cores 6 \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.initialExecutors=15 \

Runtime : 3 Hrs

On monitoring the metrics I notices 10G for executors is not required
(since I don't have lot of groupings)

Reducing to --executor-memory 3G, Runtime reduced to: 2 Hrs

Question:
On adding more nodes now has absolutely no effect on the runtime. Is there
anything I can tune/change/experiment with to make the job faster.

Workload: Mostly reduceBy's and scans.

Would appreciate any insights and thoughts. Best Regards

Reply via email to