This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new dcccbf4f9dd [SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior dcccbf4f9dd is described below commit dcccbf4f9ddd22dc59e6199a940625f677b23a81 Author: Yikun Jiang <yikunk...@gmail.com> AuthorDate: Tue Jul 19 09:34:32 2022 +0900 [SPARK-39807][PYTHON][PS] Respect Series.concat sort parameter to follow 1.4.3 behavior ### What changes were proposed in this pull request? Respect Series.concat sort parameter when `num_series == 1` to follow 1.4.3 behavior. ### Why are the changes needed? In https://github.com/apache/spark/pull/36711, we follow the pandas 1.4.2 behaviors to respect Series.concat sort parameter except `num_series == 1` case. In [pandas 1.4.3](https://github.com/pandas-dev/pandas/releases/tag/v1.4.3), fix the issue https://github.com/pandas-dev/pandas/issues/47127. The bug of `num_series == 1` is also fixed, so we add this PR to follow panda 1.4.3 behavior. ### Does this PR introduce _any_ user-facing change? Yes, we already cover this case in: https://github.com/apache/spark/blob/master/python/docs/source/migration_guide/pyspark_3.3_to_3.4.rst ``` In Spark 3.4, the Series.concat sort parameter will be respected to follow pandas 1.4 behaviors. ``` ### How was this patch tested? - CI passed - test_concat_index_axis passed with panda 1.3.5, 1.4.2, 1.4.3. Closes #37217 from Yikun/SPARK-39807. Authored-by: Yikun Jiang <yikunk...@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- python/pyspark/pandas/namespace.py | 5 ++--- python/pyspark/pandas/tests/test_namespace.py | 20 +++++++++++--------- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/python/pyspark/pandas/namespace.py b/python/pyspark/pandas/namespace.py index 7691bf465e7..0f0dc606c52 100644 --- a/python/pyspark/pandas/namespace.py +++ b/python/pyspark/pandas/namespace.py @@ -2621,9 +2621,8 @@ def concat( assert len(merged_columns) > 0 - # If sort is True, always sort when there are more than two Series, - # and if there is only one Series, never sort to follow pandas 1.4+ behavior. - if sort and num_series != 1: + # If sort is True, always sort + if sort: # FIXME: better ordering merged_columns = sorted(merged_columns, key=name_like_string) diff --git a/python/pyspark/pandas/tests/test_namespace.py b/python/pyspark/pandas/tests/test_namespace.py index 4db756c6e66..ac033f7828b 100644 --- a/python/pyspark/pandas/tests/test_namespace.py +++ b/python/pyspark/pandas/tests/test_namespace.py @@ -334,19 +334,21 @@ class NamespaceTest(PandasOnSparkTestCase, SQLTestUtils): ([psdf.reset_index(), psdf], [pdf.reset_index(), pdf]), ([psdf, psdf[["C", "A"]]], [pdf, pdf[["C", "A"]]]), ([psdf[["C", "A"]], psdf], [pdf[["C", "A"]], pdf]), - # only one Series - ([psdf, psdf["C"]], [pdf, pdf["C"]]), - ([psdf["C"], psdf], [pdf["C"], pdf]), # more than two Series ([psdf["C"], psdf, psdf["A"]], [pdf["C"], pdf, pdf["A"]]), ] - if LooseVersion(pd.__version__) >= LooseVersion("1.4"): - # more than two Series - psdfs, pdfs = ([psdf, psdf["C"], psdf["A"]], [pdf, pdf["C"], pdf["A"]]) - for ignore_index, join, sort in itertools.product(ignore_indexes, joins, sorts): - # See also https://github.com/pandas-dev/pandas/issues/47127 - if (join, sort) != ("outer", True): + # See also https://github.com/pandas-dev/pandas/issues/47127 + if LooseVersion(pd.__version__) >= LooseVersion("1.4.3"): + series_objs = [ + # more than two Series + ([psdf, psdf["C"], psdf["A"]], [pdf, pdf["C"], pdf["A"]]), + # only one Series + ([psdf, psdf["C"]], [pdf, pdf["C"]]), + ([psdf["C"], psdf], [pdf["C"], pdf]), + ] + for psdfs, pdfs in series_objs: + for ignore_index, join, sort in itertools.product(ignore_indexes, joins, sorts): self.assert_eq( ps.concat(psdfs, ignore_index=ignore_index, join=join, sort=sort), pd.concat(pdfs, ignore_index=ignore_index, join=join, sort=sort), --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org