This is an automated email from the ASF dual-hosted git repository.
srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
from f6c4e58b85d [SPARK-40407][SQL] Fix the potential data skew caused by
df.repartition
add 08678456d16 [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS
No new revisions were added by this update.
Summary of changes:
.../org/apache/spark/ml/recommendation/ALS.scala | 18 ++--
.../ml/recommendation/TopByKeyAggregator.scala | 59 -----------
.../spark/ml/recommendation/CollectTopKSuite.scala | 111 +++++++++++++++++++++
.../recommendation/TopByKeyAggregatorSuite.scala | 73 --------------
.../catalyst/expressions/aggregate/collect.scala | 46 ++++++++-
.../scala/org/apache/spark/sql/functions.scala | 3 +
6 files changed, 169 insertions(+), 141 deletions(-)
delete mode 100644
mllib/src/main/scala/org/apache/spark/ml/recommendation/TopByKeyAggregator.scala
create mode 100644
mllib/src/test/scala/org/apache/spark/ml/recommendation/CollectTopKSuite.scala
delete mode 100644
mllib/src/test/scala/org/apache/spark/ml/recommendation/TopByKeyAggregatorSuite.scala
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]