[spark] branch master updated: [SPARK-43453][PS] Ignore the `names` of `MultiIndex` when `axis=1` for `concat`

dongjoon Tue, 19 Sep 2023 10:41:23 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 9fb44456b097 [SPARK-43453][PS] Ignore the `names` of `MultiIndex` when 
`axis=1` for `concat`
9fb44456b097 is described below

commit 9fb44456b09702a7224c548e9081655bb30dc517
Author: Haejoon Lee <[email protected]>
AuthorDate: Tue Sep 19 10:41:03 2023 -0700

    [SPARK-43453][PS] Ignore the `names` of `MultiIndex` when `axis=1` for 
`concat`
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to update the behavior of `ps.concat` to follow the Pandas 
behavior, and enable corresponding tests.
    
    ### Why are the changes needed?
    
    To follow the latest Pandas.
    
    ### Does this PR introduce _any_ user-facing change?
    
    For the `MultiIndex` columns:
    ```python
    >>> psdf3
    X   X
    AB  A  B
    1   0  1
    2   2  3
    3   4  5
    >>> psdf4
    Y   X
    CD  C  D
    1   1  4
    3   2  5
    5   3  6
    ```
    The behavior of `ps.concat` with `axis=1` is changed:
    
    **Before**
    ```python
    >>> ps.concat([psdf3, psdf4], axis=1)
    X     X
    AB    A    B    C    D
    1   0.0  1.0  1.0  4.0
    2   2.0  3.0  NaN  NaN
    3   4.0  5.0  2.0  5.0
    5   NaN  NaN  3.0  6.0
    ```
    
    **After (Ignore the names of MultiIndex columns to follow the latest 
Pandas)**
    ```python
    >>> ps.concat([psdf3, psdf4], axis=1)
         X
         A    B    C    D
    1  0.0  1.0  1.0  4.0
    2  2.0  3.0  NaN  NaN
    3  4.0  5.0  2.0  5.0
    5  NaN  NaN  3.0  6.0
    ```
    
    ### How was this patch tested?
    
    Enabling the existing tests.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #42991 from itholic/SPARK-43453.
    
    Authored-by: Haejoon Lee <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 python/pyspark/pandas/namespace.py                     | 4 ++++
 python/pyspark/pandas/tests/test_ops_on_diff_frames.py | 5 -----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/pandas/namespace.py 
b/python/pyspark/pandas/namespace.py
index 2f951608b727..f7c07b37c166 100644
--- a/python/pyspark/pandas/namespace.py
+++ b/python/pyspark/pandas/namespace.py
@@ -2564,6 +2564,10 @@ def concat(
         if sort:
             concat_psdf = concat_psdf.sort_index()
 
+        columns = concat_psdf.columns
+        if isinstance(columns, pd.MultiIndex):
+            concat_psdf = concat_psdf.rename_axis([None] * columns.nlevels, 
axis="columns")
+
         return concat_psdf
 
     # Series, Series ...
diff --git a/python/pyspark/pandas/tests/test_ops_on_diff_frames.py 
b/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
index 612af9a020ff..f39a3c4a0abc 100644
--- a/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
+++ b/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
@@ -493,11 +493,6 @@ class OpsOnDiffFramesEnabledTestsMixin:
             ),
         )
 
-    @unittest.skipIf(
-        LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
-        "TODO(SPARK-43453): Enable 
OpsOnDiffFramesEnabledTests.test_concat_column_axis "
-        "for pandas 2.0.0.",
-    )
     def test_concat_column_axis(self):
         pdf1 = pd.DataFrame({"A": [0, 2, 4], "B": [1, 3, 5]}, index=[1, 2, 3])
         pdf1.columns.names = ["AB"]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-43453][PS] Ignore the `names` of `MultiIndex` when `axis=1` for `concat`

Reply via email to