(spark) branch master updated: [SPARK-55140][PYTHON][PS] Do not map builtin functions to numpy version for pandas 3

ruifengz Fri, 23 Jan 2026 06:01:19 -0800

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 137aa8ed6b1e [SPARK-55140][PYTHON][PS] Do not map builtin functions to 
numpy version for pandas 3
137aa8ed6b1e is described below

commit 137aa8ed6b1e1b7af15b5b888da2be2798f52652
Author: Tian Gao <[email protected]>
AuthorDate: Fri Jan 23 22:00:10 2026 +0800

    [SPARK-55140][PYTHON][PS] Do not map builtin functions to numpy version for 
pandas 3
    
    ### What changes were proposed in this pull request?
    
    * We do not map builtin functions to numpy version for pandas 3
    * Used `is_builtin_func` in pandas 2 to replace the even more private 
`_builtin_table`.
    
    ### Why are the changes needed?
    
    Pandas 3 decided to not map the builtin functions for groupby anymore - 
https://github.com/pandas-dev/pandas/issues/53425 and we should match the 
behavior. Also the private API we were using was removed completely.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, the behavior for groupby apply will be slightly different if users are 
using pandas 3.
    
    ### How was this patch tested?
    
    Locally at least the import error is fixed. CI is pinned at 2.x so it 
should pass.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #53925 from gaogaotiantian/fix-builtin-mapping.
    
    Authored-by: Tian Gao <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 python/pyspark/pandas/groupby.py | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py
index bdc6a66448e0..b66f079897dc 100644
--- a/python/pyspark/pandas/groupby.py
+++ b/python/pyspark/pandas/groupby.py
@@ -45,7 +45,6 @@ import warnings
 
 import pandas as pd
 from pandas.api.types import is_number, is_hashable, is_list_like
-from pandas.core.common import _builtin_table  # type: ignore[import-untyped]
 
 from pyspark.sql import Column, DataFrame as SparkDataFrame, Window, functions 
as F
 from pyspark.sql.internal import InternalFunction as SF
@@ -59,6 +58,7 @@ from pyspark.sql.types import (
     StringType,
 )
 from pyspark import pandas as ps  # For running doctests and reference 
resolution in PyCharm.
+from pyspark.loose_version import LooseVersion
 from pyspark.pandas._typing import Axis, FrameLike, Label, Name
 from pyspark.pandas.typedef import infer_return_type, DataFrameType, 
ScalarType, SeriesType
 from pyspark.pandas.frame import DataFrame
@@ -1955,11 +1955,17 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
             psdf, self._groupkeys, agg_columns
         )
 
+        if LooseVersion(pd.__version__) < "3.0.0":
+            from pandas.core.common import is_builtin_func  # type: 
ignore[import-untyped]
+
+            f = is_builtin_func(func)
+        else:
+            f = func
+
         if is_series_groupby:
             name = psdf.columns[-1]
-            pandas_apply = _builtin_table.get(func, func)
+            pandas_apply = f
         else:
-            f = _builtin_table.get(func, func)
 
             def pandas_apply(pdf: pd.DataFrame, *a: Any, **k: Any) -> Any:
                 return f(pdf.drop(groupkey_names, axis=1), *a, **k)
@@ -2182,7 +2188,12 @@ class GroupBy(Generic[FrameLike], metaclass=ABCMeta):
                 return 
pd.DataFrame(pdf.groupby(groupkey_names)[pdf.columns[-1]].filter(func))  # 
type: ignore[arg-type]
 
         else:
-            f = _builtin_table.get(func, func)
+            if LooseVersion(pd.__version__) < "3.0.0":
+                from pandas.core.common import is_builtin_func
+
+                f = is_builtin_func(func)
+            else:
+                f = func
 
             def wrapped_func(pdf: pd.DataFrame) -> pd.DataFrame:
                 return f(pdf.drop(groupkey_names, axis=1))


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-55140][PYTHON][PS] Do not map builtin functions to numpy version for pandas 3

Reply via email to