This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 5dbd6ff6aa7 [SPARK-44267][PS][INFRA] Upgrade `pandas` to 2.0.3 5dbd6ff6aa7 is described below commit 5dbd6ff6aa714f0e2e065f41dcb68b7f793caa86 Author: panbingkun <pbk1...@gmail.com> AuthorDate: Mon Jul 10 15:43:26 2023 +0900 [SPARK-44267][PS][INFRA] Upgrade `pandas` to 2.0.3 ### What changes were proposed in this pull request? The pr aims to upgrade `pandas` from 2.0.2 to 2.0.3. ### Why are the changes needed? 1.The new version brings some bug fixed, eg: - Bug in DataFrame.convert_dtype() and Series.convert_dtype() when trying to convert [ArrowDtype](https://pandas.pydata.org/docs/reference/api/pandas.ArrowDtype.html#pandas.ArrowDtype) with dtype_backend="nullable_numpy" ([GH53648](https://github.com/pandas-dev/pandas/issues/53648)) - Bug in [read_csv()](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv) when defining dtype with bool[pyarrow] for the "c" and "python" engines ([GH53390](https://github.com/pandas-dev/pandas/issues/53390)) 2.Release notes: https://pandas.pydata.org/docs/whatsnew/v2.0.3.html ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #41812 from panbingkun/SPARK-44267. Authored-by: panbingkun <pbk1...@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- dev/infra/Dockerfile | 4 ++-- python/pyspark/pandas/supported_api_gen.py | 2 +- python/pyspark/pandas/tests/groupby/test_aggregate.py | 5 +++++ 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 3b95467389a..af8e1a980f9 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -64,8 +64,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='ht # See more in SPARK-39735 ENV R_LIBS_SITE "/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library" -RUN pypy3 -m pip install numpy 'pandas<=2.0.2' scipy coverage matplotlib -RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.0.2' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' +RUN pypy3 -m pip install numpy 'pandas<=2.0.3' scipy coverage matplotlib +RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.0.3' scipy unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' # Add Python deps for Spark Connect. RUN python3.9 -m pip install grpcio protobuf googleapis-common-protos grpcio-status diff --git a/python/pyspark/pandas/supported_api_gen.py b/python/pyspark/pandas/supported_api_gen.py index d259171ecb9..06591c5b26a 100644 --- a/python/pyspark/pandas/supported_api_gen.py +++ b/python/pyspark/pandas/supported_api_gen.py @@ -98,7 +98,7 @@ def generate_supported_api(output_rst_file_path: str) -> None: Write supported APIs documentation. """ - pandas_latest_version = "2.0.2" + pandas_latest_version = "2.0.3" if LooseVersion(pd.__version__) != LooseVersion(pandas_latest_version): msg = ( "Warning: Latest version of pandas (%s) is required to generate the documentation; " diff --git a/python/pyspark/pandas/tests/groupby/test_aggregate.py b/python/pyspark/pandas/tests/groupby/test_aggregate.py index bb5b165306d..6ceae82caa8 100644 --- a/python/pyspark/pandas/tests/groupby/test_aggregate.py +++ b/python/pyspark/pandas/tests/groupby/test_aggregate.py @@ -15,6 +15,7 @@ # limitations under the License. # import unittest +from distutils.version import LooseVersion import pandas as pd @@ -39,6 +40,10 @@ class GroupbyAggregateMixin: def psdf(self): return ps.from_pandas(self.pdf) + @unittest.skipIf( + LooseVersion(pd.__version__) >= LooseVersion("2.0.0"), + "TODO(SPARK-44289): Enable GroupbyAggregateTests.test_aggregate for pandas 2.0.0.", + ) def test_aggregate(self): pdf = pd.DataFrame( {"A": [1, 1, 2, 2], "B": [1, 2, 3, 4], "C": [0.362, 0.227, 1.267, -0.562]} --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org