Re: Equally split a RDD partition into two partition at the same node

Fei Hu Sun, 15 Jan 2017 21:21:02 -0800

Hi Liang-Chi,

Yes, you are right. I implement the following solution for this problem,
and it works. But I am not sure if it is efficient:


I double the partitions of the parent RDD, and then use the new partitions
and parent RDD to construct the target RDD. In the compute() function of
the target RDD, I use the input partition to get the corresponding parent
partition, and get the half elements in the parent partitions as the output
of the computing function.

Thanks,
Fei

On Sun, Jan 15, 2017 at 11:01 PM, Liang-Chi Hsieh <vii...@gmail.com> wrote:

>
> Hi,
>
> When calling `coalesce` with `shuffle = false`, it is going to produce at
> most min(numPartitions, previous RDD's number of partitions). So I think it
> can't be used to double the number of partitions.
>
>
> Anastasios Zouzias wrote
> > Hi Fei,
> >
> > How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ?
> >
> > https://github.com/apache/spark/blob/branch-1.6/core/
> src/main/scala/org/apache/spark/rdd/RDD.scala#L395
> >
> > coalesce is mostly used for reducing the number of partitions before
> > writing to HDFS, but it might still be a narrow dependency (satisfying
> > your
> > requirements) if you increase the # of partitions.
> >
> > Best,
> > Anastasios
> >
> > On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu &lt;
>
> > hufei68@
>
> > &gt; wrote:
> >
> >> Dear all,
> >>
> >> I want to equally divide a RDD partition into two partitions. That
> means,
> >> the first half of elements in the partition will create a new partition,
> >> and the second half of elements in the partition will generate another
> >> new
> >> partition. But the two new partitions are required to be at the same
> node
> >> with their parent partition, which can help get high data locality.
> >>
> >> Is there anyone who knows how to implement it or any hints for it?
> >>
> >> Thanks in advance,
> >> Fei
> >>
> >>
> >
> >
> > --
> > -- Anastasios Zouzias
> > &lt;
>
> > azo@.ibm
>
> > &gt;
>
>
>
>
>
> -----
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Equally-split-a-
> RDD-partition-into-two-partition-at-the-same-node-tp20597p20608.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Equally split a RDD partition into two partition at the same node

Reply via email to