[ https://issues.apache.org/jira/browse/SPARK-32384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wuyi resolved SPARK-32384. -------------------------- Fix Version/s: 3.2.0 Assignee: zhengruifeng Resolution: Fixed Resolved by https://github.com/apache/spark/pull/31480 > repartitionAndSortWithinPartitions avoid shuffle with same partitioner > ---------------------------------------------------------------------- > > Key: SPARK-32384 > URL: https://issues.apache.org/jira/browse/SPARK-32384 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.1.0 > Reporter: zhengruifeng > Assignee: zhengruifeng > Priority: Minor > Fix For: 3.2.0 > > > In {{combineByKeyWithClassTag}}, there is a check so that if the partitioner > is the same as the one of the RDD: > {code:java} > if (self.partitioner == Some(partitioner)) { > self.mapPartitions(iter => { > val context = TaskContext.get() > new InterruptibleIterator(context, aggregator.combineValuesByKey(iter, > context)) > }, preservesPartitioning = true) > } else { > new ShuffledRDD[K, V, C](self, partitioner) > .setSerializer(serializer) > .setAggregator(aggregator) > .setMapSideCombine(mapSideCombine) > } > {code} > > In {{repartitionAndSortWithinPartitions}}, this shuffle can also be skipped > in this case. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org