Hi Anny,

Much more partitions is not recommended in general as that creates a lot of
small tasks. All the tasks needs to send to worker nodes for execution.
Too many partitions increases task scheduling overhead.

Spark uses synchronous execution model which means that all tasks in a
stage need to finish before executing the next stage. 2-4 tasks per core
keep CPUs busy in cases that some tasks are small and finishes early.

Hope this helps!
Liquan

On Mon, Sep 29, 2014 at 2:17 PM, 陈韵竹 <anny9...@gmail.com> wrote:

> Thanks Liquan! I thought about the same thing, but then why people are
> still using much more partitions than core number?
>
> Anny
>
> On Mon, Sep 29, 2014 at 2:12 PM, Liquan Pei <liquan...@gmail.com> wrote:
>
>> The number of cores available in your cluster determines the number of
>> tasks that can be run concurrently.  If your data is evenly partitioned,
>> the number of partitions should approximately equal to total_coreNumber.
>>
>> Liquan
>>
>> On Mon, Sep 29, 2014 at 2:01 PM, anny9699 <anny9...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I read the past posts about partition number, but am still a little
>>> confused
>>> about partitioning strategy.
>>>
>>> I have a cluster with 8 works and 2 cores for each work. Is it true that
>>> the
>>> optimal partition number should be 2-4 * total_coreNumber or should
>>> approximately equal to total_coreNumber? Or it's the task number that
>>> really
>>> determines the speed rather then partition number?
>>>
>>> Thanks a lot!
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/about-partition-number-tp15362.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Liquan Pei
>> Department of Physics
>> University of Massachusetts Amherst
>>
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Reply via email to