Hi, When calling `coalesce` with `shuffle = false`, it is going to produce at most min(numPartitions, previous RDD's number of partitions). So I think it can't be used to double the number of partitions.
Anastasios Zouzias wrote > Hi Fei, > > How you tried coalesce(numPartitions: Int, shuffle: Boolean = false) ? > > https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L395 > > coalesce is mostly used for reducing the number of partitions before > writing to HDFS, but it might still be a narrow dependency (satisfying > your > requirements) if you increase the # of partitions. > > Best, > Anastasios > > On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu < > hufei68@ > > wrote: > >> Dear all, >> >> I want to equally divide a RDD partition into two partitions. That means, >> the first half of elements in the partition will create a new partition, >> and the second half of elements in the partition will generate another >> new >> partition. But the two new partitions are required to be at the same node >> with their parent partition, which can help get high data locality. >> >> Is there anyone who knows how to implement it or any hints for it? >> >> Thanks in advance, >> Fei >> >> > > > -- > -- Anastasios Zouzias > < > azo@.ibm > > ----- Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Equally-split-a-RDD-partition-into-two-partition-at-the-same-node-tp20597p20608.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org