most of the RDD methods which shuffle data take Partitioner as a parameter But rdd.distinct does not have such signature
Should I open a PR for that? /** * Return a new RDD containing the distinct elements in this RDD. */ def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T] = withScope { map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1) }