Hi Alexander, There is currently no way to create an RDD with more partitions than its parent RDD without causing a shuffle.
However, if the files are splittable, you can set the Hadoop configurations that control split size to something smaller so that the HadoopRDD ends up with more partitions. -Sandy On Thu, Jun 18, 2015 at 2:26 PM, Ulanov, Alexander <alexander.ula...@hp.com> wrote: > Hi, > > > > Is there a way to increase the amount of partition of RDD without causing > shuffle? I’ve found JIRA issue > https://issues.apache.org/jira/browse/SPARK-5997 however there is no > implementation yet. > > > > Just in case, I am reading data from ~300 big binary files, which results > in 300 partitions, then I need to sort my RDD, but it crashes with > outofmemory exception. If I change the number of partitions to 2000, sort > works OK, but repartition itself takes a lot of time due to shuffle. > > > > Best regards, Alexander >