Hi,

Coalesce is used to decrease the number of partitions. If you give the value of 
numPartitions greater than the current partition, I don’t think RDD number of 
partitions will be increased.

Thanks,
Jasbir

From: Fei Hu [mailto:hufe...@gmail.com]
Sent: Sunday, January 15, 2017 10:10 PM
To: zouz...@cs.toronto.edu
Cc: user @spark <u...@spark.apache.org>; dev@spark.apache.org
Subject: Re: Equally split a RDD partition into two partition at the same node

Hi Anastasios,

Thanks for your reply. If I just increase the numPartitions to be twice larger, 
how coalesce(numPartitions: Int, shuffle: Boolean = false) keeps the data 
locality? Do I need to define my own Partitioner?

Thanks,
Fei

On Sun, Jan 15, 2017 at 3:58 AM, Anastasios Zouzias 
<zouz...@gmail.com<mailto:zouz...@gmail.com>> wrote:
Hi Fei,

How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?

https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_branch-2D1.6_core_src_main_scala_org_apache_spark_rdd_RDD.scala-23L395&d=DgMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=7scIIjM0jY9x3fjvY6a_yERLxMA2NwA8l0DnuyrL6yA&m=bFMBTBwSwMOFRd7Or6fF0sQOH87UIhmuUqEO9UkxPIY&s=qNa3MyvKhIDlXHtxm3s0DZJRZaSWIHpaNhcS86GEQow&e=>

coalesce is mostly used for reducing the number of partitions before writing to 
HDFS, but it might still be a narrow dependency (satisfying your requirements) 
if you increase the # of partitions.

Best,
Anastasios

On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu 
<hufe...@gmail.com<mailto:hufe...@gmail.com>> wrote:
Dear all,

I want to equally divide a RDD partition into two partitions. That means, the 
first half of elements in the partition will create a new partition, and the 
second half of elements in the partition will generate another new partition. 
But the two new partitions are required to be at the same node with their 
parent partition, which can help get high data locality.

Is there anyone who knows how to implement it or any hints for it?

Thanks in advance,
Fei




--
-- Anastasios Zouzias


________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
______________________________________________________________________________________

www.accenture.com

Reply via email to