I have a stage that spawns 174 tasks when i run repartition on avro data.
Tasks read between 512/317/316/214/173 MB of data. Even if i increase
number of executors/ number of partitions (when calling repartition) the
number of tasks launched remains fixed to 174.
1) I want to speed up this
Did you have a change of the value of 'spark.default.parallelism'?be a
bigger number.
2015-06-05 17:56 GMT+08:00 Evo Eftimov evo.efti...@isecc.com:
It may be that your system runs out of resources (ie 174 is the ceiling)
due to the following
1. RDD Partition = (Spark) Task
2.
I did not change spark.default.parallelism,
What is recommended value for it.
On Fri, Jun 5, 2015 at 3:31 PM, 李铖 lidali...@gmail.com wrote:
Did you have a change of the value of 'spark.default.parallelism'?be a
bigger number.
2015-06-05 17:56 GMT+08:00 Evo Eftimov evo.efti...@isecc.com:
It
just multiply 2-4 with the cpu core number of the node .
2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com:
I did not change spark.default.parallelism,
What is recommended value for it.
On Fri, Jun 5, 2015 at 3:31 PM, 李铖 lidali...@gmail.com wrote:
Did you have a change of the
It may be that your system runs out of resources (ie 174 is the ceiling) due to
the following
1. RDD Partition = (Spark) Task
2. RDD Partition != (Spark) Executor
3. (Spark) Task != (Spark) Executor
4. (Spark) Task = JVM Thread
5. (Spark) Executor = JVM
The param is for “Default number of partitions in RDDs returned by
transformations like join, reduceByKey, and parallelize when NOT set by user.”
While Deepak is setting the number of partitions EXPLICITLY
From: 李铖 [mailto:lidali...@gmail.com]
Sent: Friday, June 5, 2015 11:08 AM
To: