This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 6adabbd76bad [SPARK-55490][PS][FOLLOW-UP] Fix
`groupby(as_index=False).agg` with dict
6adabbd76bad is described below
commit 6adabbd76bad7a3155398ff4de6085d34dc9e693
Author: Takuya Ueshin <[email protected]>
AuthorDate: Wed Feb 18 11:46:21 2026 -0800
[SPARK-55490][PS][FOLLOW-UP] Fix `groupby(as_index=False).agg` with dict
### What changes were proposed in this pull request?
This is a follow-up of apache/spark#54276.
Fixes `groupby(as_index=False).agg` with dict.
### Why are the changes needed?
The case of `groupby(as_index=False).agg` with dict was missing at
apache/spark#54276.
```py
>>> psdf = ps.DataFrame(
... {"A": [1, 1, 2, 2], "B": [1, 2, 3, 4], "C": [0.362, 0.227, 1.267,
-0.562]}
... )
>>> psdf.groupby(psdf.A, as_index=False).agg({"B": "min", "C": "sum"})
A B C
0 1 1 0.589
1 2 3 0.705
>>>
>>> psdf.groupby(psdf.A + 1, as_index=False).agg({"B": "min", "C": "sum"})
B C
0 1 0.589
1 3 0.705
```
whereas pandas 3:
```py
>>> pdf = pd.DataFrame(
... {"A": [1, 1, 2, 2], "B": [1, 2, 3, 4], "C": [0.362, 0.227, 1.267,
-0.562]}
... )
>>> pdf.groupby(pdf.A, as_index=False).agg({"B": "min", "C": "sum"})
A B C
0 1 1 0.589
1 2 3 0.705
>>>
>>> pdf.groupby(pdf.A + 1, as_index=False).agg({"B": "min", "C": "sum"})
A B C
0 2 1 0.589
1 3 3 0.705
```
### Does this PR introduce _any_ user-facing change?
Yes, it will behave more like pandas 3.
### How was this patch tested?
The existing tests should pass.
### Was this patch authored or co-authored using generative AI tooling?
Codex (GPT-5.3-Codex)
Closes #54352 from ueshin/issuse/SPARK-55490/as_index_dict.
Authored-by: Takuya Ueshin <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/pandas/groupby.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py
index 37993e5f2499..f9e8123555ad 100644
--- a/python/pyspark/pandas/groupby.py
+++ b/python/pyspark/pandas/groupby.py
@@ -329,7 +329,7 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
i for i, gkey in enumerate(self._groupkeys) if gkey._psdf
is not self._psdf
)
else:
- column_names = [column.name for column in self._agg_columns]
+ column_names = set(func_or_funcs)
should_drop_index = set(
i for i, gkey in enumerate(self._groupkeys) if gkey.name
in column_names
)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]