The param is for “Default number of partitions in RDDs returned by 
transformations like join, reduceByKey, and parallelize when NOT set by user.”

 

While Deepak is setting the number of partitions EXPLICITLY 

 

From: 李铖 [mailto:lidali...@gmail.com] 
Sent: Friday, June 5, 2015 11:08 AM
To: ÐΞ€ρ@Ҝ (๏̯͡๏)
Cc: Evo Eftimov; user
Subject: Re: How to increase the number of tasks

 

just multiply 2-4 with the cpu core number of the node .

 

2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>:

I did not change spark.default.parallelism,

What is recommended value for it. 

 

On Fri, Jun 5, 2015 at 3:31 PM, 李铖 <lidali...@gmail.com> wrote:

Did you have a change of the value of 'spark.default.parallelism'?be a bigger 
number.

 

2015-06-05 17:56 GMT+08:00 Evo Eftimov <evo.efti...@isecc.com>:

It may be that your system runs out of resources (ie 174 is the ceiling) due to 
the following 

 

1.       RDD Partition = (Spark) Task

2.       RDD Partition != (Spark) Executor

3.       (Spark) Task != (Spark) Executor

4.       (Spark) Task = JVM Thread

5.       (Spark) Executor = JVM instance 

 

From: ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepuj...@gmail.com] 
Sent: Friday, June 5, 2015 10:48 AM
To: user
Subject: How to increase the number of tasks

 

I have a  stage that spawns 174 tasks when i run repartition on avro data. 

Tasks read between 512/317/316/214/173  MB of data. Even if i increase number 
of executors/ number of partitions (when calling repartition) the number of 
tasks launched remains fixed to 174.

 

1) I want to speed up this task. How do i do it ?

2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is this 
behavior ?

Since this is a repartition stage, it should not depend on the nature of data.

 

Its taking more than 30 mins and i want to speed it up by throwing more 
executors at it.

 

Please suggest

 

Deepak

 

 





 

-- 

Deepak

 

 

Reply via email to