I haven't worked with datasets but would this help
https://stackoverflow.com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd
?

On Jun 23, 2017 5:43 PM, "Keith Chapman" <keithgchap...@gmail.com> wrote:

> Hi,
>
> I have code that does the following using RDDs,
>
> val outputPartitionCount = 300
> val part = new MyOwnPartitioner(outputPartitionCount)
> val finalRdd = myRdd.repartitionAndSortWithinPartitions(part)
>
> where myRdd is correctly formed as key, value pairs. I am looking convert
> this to use Dataset/Dataframe instead of RDDs, so my question is:
>
> Is there custom partitioning of Dataset/Dataframe implemented in Spark?
> Can I accomplish the partial sort using mapPartitions on the resulting
> partitioned Dataset/Dataframe?
>
> Any thoughts?
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>

Reply via email to