This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new e617503c3f0 [SPARK-40589][PS][TEST] Fix test for `DataFrame.corr_with` skip the pandas regression e617503c3f0 is described below commit e617503c3f06be9eea0af529bab7984fc07e87a2 Author: itholic <haejoon....@databricks.com> AuthorDate: Fri Sep 30 09:45:57 2022 +0800 [SPARK-40589][PS][TEST] Fix test for `DataFrame.corr_with` skip the pandas regression ### What changes were proposed in this pull request? This PR proposes to skip the `DataFrame.corr_with` test when the `other` is `pyspark.pandas.Series` and the `method` is "spearman" or "pearson", since there is regression in pandas 1.5.0 for that cases. ### Why are the changes needed? There are some regressions in pandas 1.5.0, so we're not going to match the behavior for those cases. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manually tested with pandas 1.5.0, confirmed the test pass. Closes #38031 from itholic/SPARK-40589. Authored-by: itholic <haejoon....@databricks.com> Signed-off-by: Ruifeng Zheng <ruife...@apache.org> --- python/pyspark/pandas/tests/test_dataframe.py | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/python/pyspark/pandas/tests/test_dataframe.py b/python/pyspark/pandas/tests/test_dataframe.py index 5da0974c906..dfac3c6d1b8 100644 --- a/python/pyspark/pandas/tests/test_dataframe.py +++ b/python/pyspark/pandas/tests/test_dataframe.py @@ -6076,7 +6076,14 @@ class DataFrameTest(ComparisonTestBase, SQLTestUtils): def _test_corrwith(self, psdf, psobj): pdf = psdf.to_pandas() pobj = psobj.to_pandas() - for method in ["pearson", "spearman", "kendall"]: + # Regression in pandas 1.5.0 when other is Series and method is "pearson" or "spearman" + # See https://github.com/pandas-dev/pandas/issues/48826 for the reported issue, + # and https://github.com/pandas-dev/pandas/pull/46174 for the initial PR that causes. + if LooseVersion(pd.__version__) >= LooseVersion("1.5.0") and isinstance(pobj, pd.Series): + methods = ["kendall"] + else: + methods = ["pearson", "spearman", "kendall"] + for method in methods: for drop in [True, False]: p_corr = pdf.corrwith(pobj, drop=drop, method=method) ps_corr = psdf.corrwith(psobj, drop=drop, method=method) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org