Thanks for the pointer Saliya, I'm looking got an equivalent api in dataset/dataframe for repartitionAndSortWithinPartitions, I've already converted most of the RDD's to Dataframes.
Regards, Keith. http://keith-chapman.com On Sat, Jun 24, 2017 at 3:48 AM, Saliya Ekanayake <esal...@gmail.com> wrote: > I haven't worked with datasets but would this help https://stackoverflow. > com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd? > > On Jun 23, 2017 5:43 PM, "Keith Chapman" <keithgchap...@gmail.com> wrote: > >> Hi, >> >> I have code that does the following using RDDs, >> >> val outputPartitionCount = 300 >> val part = new MyOwnPartitioner(outputPartitionCount) >> val finalRdd = myRdd.repartitionAndSortWithinPartitions(part) >> >> where myRdd is correctly formed as key, value pairs. I am looking convert >> this to use Dataset/Dataframe instead of RDDs, so my question is: >> >> Is there custom partitioning of Dataset/Dataframe implemented in Spark? >> Can I accomplish the partial sort using mapPartitions on the resulting >> partitioned Dataset/Dataframe? >> >> Any thoughts? >> >> Regards, >> Keith. >> >> http://keith-chapman.com >> >