It's probably due to GC. On Fri, May 20, 2016 at 5:54 PM, Yash Sharma <yash...@gmail.com> wrote:
> Hi All, > I am here to get some expert advice on a use case I am working on. > > Cluster & job details below - > > Data - 6 Tb > Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps) > > Parameters- > --executor-memory 10G \ > --executor-cores 6 \ > --conf spark.dynamicAllocation.enabled=true \ > --conf spark.dynamicAllocation.initialExecutors=15 \ > > Runtime : 3 Hrs > > On monitoring the metrics I notices 10G for executors is not required > (since I don't have lot of groupings) > > Reducing to --executor-memory 3G, Runtime reduced to: 2 Hrs > > Question: > On adding more nodes now has absolutely no effect on the runtime. Is there > anything I can tune/change/experiment with to make the job faster. > > Workload: Mostly reduceBy's and scans. > > Would appreciate any insights and thoughts. Best Regards > > >