You can just do that with mapPartitions pretty easily can’t you? On Wed, Jan 31, 2018 at 11:08 PM Ruifeng Zheng <ruife...@foxmail.com> wrote:
> HI all: > > > > 1, Dataset API supports operation “sortWithinPartitions”, but in > RDD API there is no counterpart (I know there is > “repartitionAndSortWithinPartitions”, but I don’t want to repartition the > RDD), I have to convert RDD to Dataset for this function. Would it make > sense to add a “sortWithinPartitions” for RDD? > > > > 2, In “aggregateByKey”/”reduceByKey”, I want to do some special > operation (like aggregator compression) after local aggregation on each > partitions. A similar case may be: compute ‘ApproximatePercentile’ for > different keys by ”reduceByKey”, it may be helpful if > ‘QuantileSummaries#compress’ is called before network communication. So I > wonder if it is useful to add a ‘aggregateWithinPartitions’ for RDD? > > > > Regards, > > Ruifeng > > > > > > > > >