This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 9fb44456b097 [SPARK-43453][PS] Ignore the `names` of `MultiIndex` when
`axis=1` for `concat`
9fb44456b097 is described below
commit 9fb44456b09702a7224c548e9081655bb30dc517
Author: Haejoon Lee <[email protected]>
AuthorDate: Tue Sep 19 10:41:03 2023 -0700
[SPARK-43453][PS] Ignore the `names` of `MultiIndex` when `axis=1` for
`concat`
### What changes were proposed in this pull request?
This PR proposes to update the behavior of `ps.concat` to follow the Pandas
behavior, and enable corresponding tests.
### Why are the changes needed?
To follow the latest Pandas.
### Does this PR introduce _any_ user-facing change?
For the `MultiIndex` columns:
```python
>>> psdf3
X X
AB A B
1 0 1
2 2 3
3 4 5
>>> psdf4
Y X
CD C D
1 1 4
3 2 5
5 3 6
```
The behavior of `ps.concat` with `axis=1` is changed:
**Before**
```python
>>> ps.concat([psdf3, psdf4], axis=1)
X X
AB A B C D
1 0.0 1.0 1.0 4.0
2 2.0 3.0 NaN NaN
3 4.0 5.0 2.0 5.0
5 NaN NaN 3.0 6.0
```
**After (Ignore the names of MultiIndex columns to follow the latest
Pandas)**
```python
>>> ps.concat([psdf3, psdf4], axis=1)
X
A B C D
1 0.0 1.0 1.0 4.0
2 2.0 3.0 NaN NaN
3 4.0 5.0 2.0 5.0
5 NaN NaN 3.0 6.0
```
### How was this patch tested?
Enabling the existing tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #42991 from itholic/SPARK-43453.
Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/pandas/namespace.py | 4 ++++
python/pyspark/pandas/tests/test_ops_on_diff_frames.py | 5 -----
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/python/pyspark/pandas/namespace.py
b/python/pyspark/pandas/namespace.py
index 2f951608b727..f7c07b37c166 100644
--- a/python/pyspark/pandas/namespace.py
+++ b/python/pyspark/pandas/namespace.py
@@ -2564,6 +2564,10 @@ def concat(
if sort:
concat_psdf = concat_psdf.sort_index()
+ columns = concat_psdf.columns
+ if isinstance(columns, pd.MultiIndex):
+ concat_psdf = concat_psdf.rename_axis([None] * columns.nlevels,
axis="columns")
+
return concat_psdf
# Series, Series ...
diff --git a/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
b/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
index 612af9a020ff..f39a3c4a0abc 100644
--- a/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
+++ b/python/pyspark/pandas/tests/test_ops_on_diff_frames.py
@@ -493,11 +493,6 @@ class OpsOnDiffFramesEnabledTestsMixin:
),
)
- @unittest.skipIf(
- LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
- "TODO(SPARK-43453): Enable
OpsOnDiffFramesEnabledTests.test_concat_column_axis "
- "for pandas 2.0.0.",
- )
def test_concat_column_axis(self):
pdf1 = pd.DataFrame({"A": [0, 2, 4], "B": [1, 3, 5]}, index=[1, 2, 3])
pdf1.columns.names = ["AB"]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]