Re: rdd only with one partition

Ted Yu Mon, 21 Dec 2015 07:18:18 -0800

Have you tried the following method ?

   * Note: With shuffle = true, you can actually coalesce to a larger number
   * of partitions. This is useful if you have a small number of partitions,
   * say 100, potentially with a few partitions being abnormally large.
Calling
   * coalesce(1000, shuffle = true) will result in 1000 partitions with the
   * data distributed using a hash partitioner.
   */
  def coalesce(numPartitions: Int, shuffle: Boolean = false)(implicit ord:
Ordering[T] = null)


Cheers

On Mon, Dec 21, 2015 at 2:47 AM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid>
wrote:

> Dear All,
>
> For some rdd, while there is just one partition, then the operation &
> arithmetic would only be single, the rdd has lose all the parallelism
> benefit from spark  system ...
>
> Is it exactly like that?
>
> Thanks very much in advance!
> Zhiliang
>
>
>

Re: rdd only with one partition

Reply via email to