[spark] branch master updated: [SPARK-45247][BUILD][PYTHON][PS] Upgrade Pandas to 2.1.1

dongjoon Fri, 22 Sep 2023 05:38:58 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 4d435637351 [SPARK-45247][BUILD][PYTHON][PS] Upgrade Pandas to 2.1.1
4d435637351 is described below

commit 4d435637351e067dea4f441179473b087f60cc16
Author: Haejoon Lee <[email protected]>
AuthorDate: Fri Sep 22 05:38:22 2023 -0700

    [SPARK-45247][BUILD][PYTHON][PS] Upgrade Pandas to 2.1.1
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to upgrade Pandas to 2.1.1.
    
    See https://pandas.pydata.org/docs/dev/whatsnew/v2.1.1.html for detail
    
    ### Why are the changes needed?
    
    Pandas 2.1.1 is released, and we should support the latest Pandas.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    The existing CI should pass
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #43025 from itholic/pandas_2.1.1.
    
    Authored-by: Haejoon Lee <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 dev/infra/Dockerfile                                 |  4 ++--
 python/pyspark/pandas/frame.py                       | 12 +++++++++---
 python/pyspark/pandas/supported_api_gen.py           |  2 +-
 python/pyspark/pandas/tests/frame/test_reindexing.py |  3 +++
 4 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 767606d299a..d816ec5ec1b 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -84,8 +84,8 @@ RUN Rscript -e "devtools::install_version('roxygen2', 
version='7.2.0', repos='ht
 # See more in SPARK-39735
 ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library"
 
-RUN pypy3 -m pip install numpy 'pandas<=2.1.0' scipy coverage matplotlib
-RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.1.0' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
+RUN pypy3 -m pip install numpy 'pandas<=2.1.1' scipy coverage matplotlib
+RUN python3.9 -m pip install numpy pyarrow 'pandas<=2.1.1' scipy 
unittest-xml-reporting plotly>=4.8 'mlflow>=2.3.1' coverage matplotlib openpyxl 
'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index 26cb15417f5..08450c0be87 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -11355,7 +11355,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
                 if len(index_scols) == 1:
                     if len(items) <= ps.get_option("compute.isin_limit"):
                         col = index_scols[0].isin([F.lit(item) for item in 
items])
-                        return DataFrame(self._internal.with_filter(col))
+                        result: DataFrame = 
DataFrame(self._internal.with_filter(col))
                     else:
                         item_sdf_col = verify_temp_column_name(
                             self._internal.spark_frame, "__item__"
@@ -11369,7 +11369,10 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
                             how="semi",
                         )
 
-                        return 
DataFrame(self._internal.with_new_sdf(joined_sdf))
+                        result = 
DataFrame(self._internal.with_new_sdf(joined_sdf))
+
+                    result.index.name = None
+                    return result
 
                 else:
                     # for multi-index
@@ -11389,7 +11392,10 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
                             col = midx_col
                         else:
                             col = col | midx_col
-                    return DataFrame(self._internal.with_filter(col))
+
+                    result = DataFrame(self._internal.with_filter(col))
+                    result.index.names = [None] * result.index.nlevels
+                    return result
             else:
                 return self[items]
         elif like is not None:
diff --git a/python/pyspark/pandas/supported_api_gen.py 
b/python/pyspark/pandas/supported_api_gen.py
index dfcd4267b41..f00757fe366 100644
--- a/python/pyspark/pandas/supported_api_gen.py
+++ b/python/pyspark/pandas/supported_api_gen.py
@@ -98,7 +98,7 @@ def generate_supported_api(output_rst_file_path: str) -> None:
 
     Write supported APIs documentation.
     """
-    pandas_latest_version = "2.1.0"
+    pandas_latest_version = "2.1.1"
     if LooseVersion(pd.__version__) != LooseVersion(pandas_latest_version):
         msg = (
             "Warning: Latest version of pandas (%s) is required to generate 
the documentation; "
diff --git a/python/pyspark/pandas/tests/frame/test_reindexing.py 
b/python/pyspark/pandas/tests/frame/test_reindexing.py
index 606efd95188..3e40c35edd6 100644
--- a/python/pyspark/pandas/tests/frame/test_reindexing.py
+++ b/python/pyspark/pandas/tests/frame/test_reindexing.py
@@ -856,6 +856,9 @@ class FrameReindexingMixin:
 
 
 class FrameReidexingTests(FrameReindexingMixin, ComparisonTestBase, 
SQLTestUtils):
+    def test_filter(self):
+        super().test_filter()
+
     pass
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-45247][BUILD][PYTHON][PS] Upgrade Pandas to 2.1.1

Reply via email to