Hi Deepak,

I wrote a couple posts with a bunch of different information about how to
tune Spark jobs.  The second one might be helpful with how to think about
tuning the number of partitions and resources?  What kind of OOMEs are you
hitting?

http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

-Sandy


On Thu, Apr 30, 2015 at 5:03 PM, java8964 <java8...@hotmail.com> wrote:

> Really not expert here, but try the following ideas:
>
> 1) I assume you are using yarn, then this blog is very good about the
> resource tuning:
> http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
>
> 2) If 12G is a hard limit in this case, then you have no option but lower
> your concurrency. Try starting set "--executor-cores=1" as first step, this
> will force each executor running with one task a time. This is worst
> efficient for your job, but try to see if your application can be finished
> without OOM.
>
> 3) Add more partitions for your RDD. For a given RDD, larger partitions
> means each partition will contain less data, which requires less memory to
> process them, and if each one processed by 1 core in each executor, that
> means you almost lower your memory requirement for executor to the lowest
> level.
>
> 4) Do you cache data? Don't cache them for now, and lower "
> spark.storage.memoryFraction", so less memory preserved for cache.
>
> Since your top priority is to avoid OOM, all the above steps will make the
> job run slower, or less efficient. In any case, first you should check
> your code logic, to see if there could be with any improvement, but we
> assume your code is already optimized, as in your email. If the above steps
> still cannot help your OOM, then maybe your data for one partition just
> cannot fit with 12G heap, based on the logic you try to do in your code.
>
> Yong
>
> ------------------------------
> From: deepuj...@gmail.com
> Date: Thu, 30 Apr 2015 18:48:12 +0530
> Subject: Expert advise needed. (POC is at crossroads)
> To: user@spark.apache.org
>
>
> I am at crossroads now and expert advise help me decide what the next
> course of the project going to be.
>
> Background : At out company we process tons of data to help build
> experimentation platform. We fire more than 300s of M/R jobs, Peta bytes of
> data, takes 24 hours and does lots of joins. Its simply stupendously
> complex.
>
> POC: Migrate a small portion of processing to Spark and aim to achieve 10x
> gains. Today this processing on M/R world takes 2.5 to 3 Hours.
>
> Data Sources: 3 (All on HDFS).
> Format: Two in Sequence File and one in Avro
> Data Size:
> 1)  64 files      169,380,175,136 bytes- Sequence
> 2) 101 files        84,957,259,664 bytes- Avro
> 3) 744 files       1,972,781,123,924 bytes- Sequence
>
> Process
> A) Map Side Join of #1 and #2
> B) Left Outer Join of A) and #3
> C) Reduce By Key of B)
> D) Map Only processing of C.
>
> Optimizations
> 1) Converted Equi-Join to Map-Side  (Broadcast variables ) Join #A.
> 2) Converted groupBy + Map => ReduceBy Key #C.
>
> I have a huge YARN (Hadoop 2.4.x) cluster at my disposal but I am limited
> to use only 12G on each node.
>
> 1) My poc (after a month of crazy research, lots of Q&A on this amazing
> forum) runs fine with 1 file each from above data sets and finishes in 10
> mins taking 4 executors. I started with 60 mins and got it down to 10 mins.
> 2) For 5 files each data set it takes 45 mins and 16 executors.
> 3) When i run against 10 files, it fails repeatedly with OOM and several
> timeout errors.
> Configs:  --num-executors 96 --driver-memory 12g --driver-java-options
> "-XX:MaxPermSize=10G" --executor-memory 12g --executor-cores 4, Spark 1.3.1
>
>
> Expert Advice
> My goal is simple to be able to complete the processing at 10x to 100x
> speed than M/R or show its not possible with Spark.
>
> *A) 10x to 100x*
> 1) What will it take in terms of # of executors, # of executor-cores ? &
> amount of memory on each executor and some unknown magic settings that am
> suppose to do to reach this goal ?
> 2) I am attaching the code for review that can further speed up
> processing, if at all its possible ?
> 3) Do i need to do something else ?
>
> *B) Give up and wait for next amazing tech to come up*
> Given the steps that i have performed so far, should i conclude that its
> not possible to achieve 10x to 100x gains and am stuck with M/R world for
> now.
>
> I am in need of help here. I am available for discussion at any time
> (day/night).
>
> Hope i provided all the details.
> Regards,
> Deepak
>
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
>

Reply via email to