This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new a0ccdf27e5ff [SPARK-47824][PS] Fix nondeterminism in pyspark.pandas.series.asof a0ccdf27e5ff is described below commit a0ccdf27e5ff30817b8f058f08f98d5b44bad2db Author: Mark Jarvin <mark.jar...@databricks.com> AuthorDate: Fri Apr 12 09:37:19 2024 +0900 [SPARK-47824][PS] Fix nondeterminism in pyspark.pandas.series.asof ### What changes were proposed in this pull request? Use the monotonically ID as a sorting condition for `max_by` instead of a literal string. ### Why are the changes needed? https://github.com/apache/spark/pull/35191 had a error where the literal string `"__monotonically_increasing_id__"` was used as the tie-breaker in `max_by` instead of the actual ID. ### Does this PR introduce _any_ user-facing change? Fixes nondeterminism in `asof` ### How was this patch tested? In some circumstances `//python:pyspark.pandas.tests.connect.series.test_parity_as_of` is sufficient to reproduce ### Was this patch authored or co-authored using generative AI tooling? No Closes #46018 from markj-db/SPARK-47824. Authored-by: Mark Jarvin <mark.jar...@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/pyspark/pandas/series.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py index 98818a368a9f..8edc2c531b51 100644 --- a/python/pyspark/pandas/series.py +++ b/python/pyspark/pandas/series.py @@ -5870,7 +5870,7 @@ class Series(Frame, IndexOpsMixin, Generic[T]): # then return monotonically_increasing_id. This will let max by # to return last index value, which is the behaviour of pandas else spark_column.isNotNull(), - monotonically_increasing_id_column, + F.col(monotonically_increasing_id_column), ), ) for index in where --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org