This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 137aa8ed6b1e [SPARK-55140][PYTHON][PS] Do not map builtin functions to
numpy version for pandas 3
137aa8ed6b1e is described below
commit 137aa8ed6b1e1b7af15b5b888da2be2798f52652
Author: Tian Gao <[email protected]>
AuthorDate: Fri Jan 23 22:00:10 2026 +0800
[SPARK-55140][PYTHON][PS] Do not map builtin functions to numpy version for
pandas 3
### What changes were proposed in this pull request?
* We do not map builtin functions to numpy version for pandas 3
* Used `is_builtin_func` in pandas 2 to replace the even more private
`_builtin_table`.
### Why are the changes needed?
Pandas 3 decided to not map the builtin functions for groupby anymore -
https://github.com/pandas-dev/pandas/issues/53425 and we should match the
behavior. Also the private API we were using was removed completely.
### Does this PR introduce _any_ user-facing change?
Yes, the behavior for groupby apply will be slightly different if users are
using pandas 3.
### How was this patch tested?
Locally at least the import error is fixed. CI is pinned at 2.x so it
should pass.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53925 from gaogaotiantian/fix-builtin-mapping.
Authored-by: Tian Gao <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
python/pyspark/pandas/groupby.py | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py
index bdc6a66448e0..b66f079897dc 100644
--- a/python/pyspark/pandas/groupby.py
+++ b/python/pyspark/pandas/groupby.py
@@ -45,7 +45,6 @@ import warnings
import pandas as pd
from pandas.api.types import is_number, is_hashable, is_list_like
-from pandas.core.common import _builtin_table # type: ignore[import-untyped]
from pyspark.sql import Column, DataFrame as SparkDataFrame, Window, functions
as F
from pyspark.sql.internal import InternalFunction as SF
@@ -59,6 +58,7 @@ from pyspark.sql.types import (
StringType,
)
from pyspark import pandas as ps # For running doctests and reference
resolution in PyCharm.
+from pyspark.loose_version import LooseVersion
from pyspark.pandas._typing import Axis, FrameLike, Label, Name
from pyspark.pandas.typedef import infer_return_type, DataFrameType,
ScalarType, SeriesType
from pyspark.pandas.frame import DataFrame
@@ -1955,11 +1955,17 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
psdf, self._groupkeys, agg_columns
)
+ if LooseVersion(pd.__version__) < "3.0.0":
+ from pandas.core.common import is_builtin_func # type:
ignore[import-untyped]
+
+ f = is_builtin_func(func)
+ else:
+ f = func
+
if is_series_groupby:
name = psdf.columns[-1]
- pandas_apply = _builtin_table.get(func, func)
+ pandas_apply = f
else:
- f = _builtin_table.get(func, func)
def pandas_apply(pdf: pd.DataFrame, *a: Any, **k: Any) -> Any:
return f(pdf.drop(groupkey_names, axis=1), *a, **k)
@@ -2182,7 +2188,12 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
return
pd.DataFrame(pdf.groupby(groupkey_names)[pdf.columns[-1]].filter(func)) #
type: ignore[arg-type]
else:
- f = _builtin_table.get(func, func)
+ if LooseVersion(pd.__version__) < "3.0.0":
+ from pandas.core.common import is_builtin_func
+
+ f = is_builtin_func(func)
+ else:
+ f = func
def wrapped_func(pdf: pd.DataFrame) -> pd.DataFrame:
return f(pdf.drop(groupkey_names, axis=1))
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]