Re: Equally split a RDD partition into two partition at the same node

Liang-Chi Hsieh Sun, 15 Jan 2017 20:03:06 -0800

Hi,

When calling `coalesce` with `shuffle = false`, it is going to produce at
most min(numPartitions, previous RDD's number of partitions). So I think it
can't be used to double the number of partitions.



Anastasios Zouzias wrote
> Hi Fei,
> 
> How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?
> 
> https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395
> 
> coalesce is mostly used for reducing the number of partitions before
> writing to HDFS, but it might still be a narrow dependency (satisfying
> your
> requirements) if you increase the # of partitions.
> 
> Best,
> Anastasios
> 
> On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu &lt;

> hufei68@

> &gt; wrote:
> 
>> Dear all,
>>
>> I want to equally divide a RDD partition into two partitions. That means,
>> the first half of elements in the partition will create a new partition,
>> and the second half of elements in the partition will generate another
>> new
>> partition. But the two new partitions are required to be at the same node
>> with their parent partition, which can help get high data locality.
>>
>> Is there anyone who knows how to implement it or any hints for it?
>>
>> Thanks in advance,
>> Fei
>>
>>
> 
> 
> -- 
> -- Anastasios Zouzias
> &lt;

> azo@.ibm

> &gt;





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Equally-split-a-RDD-partition-into-two-partition-at-the-same-node-tp20597p20608.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Equally split a RDD partition into two partition at the same node

Reply via email to