---------- Forwarded message ----------
From: Liquan Pei <liquan...@gmail.com>
Date: Mon, Sep 29, 2014 at 2:12 PM
Subject: Re: about partition number
To: anny9699 <anny9...@gmail.com>


The number of cores available in your cluster determines the number of
tasks that can be run concurrently.  If your data is evenly partitioned,
the number of partitions should approximately equal to total_coreNumber.

Liquan

On Mon, Sep 29, 2014 at 2:01 PM, anny9699 <anny9...@gmail.com> wrote:

> Hi,
>
> I read the past posts about partition number, but am still a little
> confused
> about partitioning strategy.
>
> I have a cluster with 8 works and 2 cores for each work. Is it true that
> the
> optimal partition number should be 2-4 * total_coreNumber or should
> approximately equal to total_coreNumber? Or it's the task number that
> really
> determines the speed rather then partition number?
>
> Thanks a lot!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/about-partition-number-tp15362.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst



-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Reply via email to