most of the RDD methods which shuffle data take Partitioner as a parameter

But rdd.distinct does not have such signature

Should I open a PR for that?

/**
 * Return a new RDD containing the distinct elements in this RDD.
 */

def distinct(partitioner: Partitioner)(implicit ord: Ordering[T] =
null): RDD[T] = withScope {
  map(x => (x, null)).reduceByKey(partitioner, (x, y) => x).map(_._1)
}

Reply via email to