Re: Expert advise needed. (POC is at crossroads)

2015-04-30 Thread ๏̯͡๏
1) As i am limited with 12G and i was doing a brodcast join (collect data and then publish), it was throwing OOM. The data size was 25G and limit was 12G, hence i reverted back to regular join. 2) I started using repartitioning, i started with 100 and now trying 200. At beginning it looked

RE: Expert advise needed. (POC is at crossroads)

2015-04-30 Thread java8964
Really not expert here, but try the following ideas: 1) I assume you are using yarn, then this blog is very good about the resource tuning: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ 2) If 12G is a hard limit in this case, then you have no option but lower

Re: Expert advise needed. (POC is at crossroads)

2015-04-30 Thread Sandy Ryza
Hi Deepak, I wrote a couple posts with a bunch of different information about how to tune Spark jobs. The second one might be helpful with how to think about tuning the number of partitions and resources? What kind of OOMEs are you hitting?