(spark) branch master updated: [MINOR][DOCS][PYTHON] Fix groupped aggreagte pandas UDF example in df.groupby.agg

dongjoon Mon, 11 Nov 2024 08:18:25 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 6dd5c5fb5754 [MINOR][DOCS][PYTHON] Fix groupped aggreagte pandas UDF 
example in df.groupby.agg
6dd5c5fb5754 is described below

commit 6dd5c5fb5754c77b4686f6e1b60759c8ffdfc871
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Nov 11 08:18:07 2024 -0800

    [MINOR][DOCS][PYTHON] Fix groupped aggreagte pandas UDF example in 
df.groupby.agg
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to fix the groupped aggreagte pandas UDF example in 
`df.groupby.agg` by using type hints.
    
    ### Why are the changes needed?
    
    To avoid encoraging users to use the old style.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it fixes the user-facing documentation
    
    ### How was this patch tested?
    
    Manually ran the example.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #48809 from HyukjinKwon/minor-fix-docstring.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 python/pyspark/sql/group.py | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/group.py b/python/pyspark/sql/group.py
index 94b4b64a0b6f..2e6941e48541 100644
--- a/python/pyspark/sql/group.py
+++ b/python/pyspark/sql/group.py
@@ -126,8 +126,9 @@ class GroupedData(PandasGroupedOpsMixin):
 
         Examples
         --------
+        >>> import pandas as pd  # doctest: +SKIP
         >>> from pyspark.sql import functions as sf
-        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+        >>> from pyspark.sql.functions import pandas_udf
         >>> df = spark.createDataFrame(
         ...      [(2, "Alice"), (3, "Alice"), (5, "Bob"), (10, "Bob")], 
["age", "name"])
         >>> df.show()
@@ -165,8 +166,8 @@ class GroupedData(PandasGroupedOpsMixin):
 
         Same as above but uses pandas UDF.
 
-        >>> @pandas_udf('int', PandasUDFType.GROUPED_AGG)  # doctest: +SKIP
-        ... def min_udf(v):
+        >>> @pandas_udf('int')  # doctest: +SKIP
+        ... def min_udf(v: pd.Series) -> int:
         ...     return v.min()
         ...
         >>> df.groupBy(df.name).agg(min_udf(df.age)).sort("name").show()  # 
doctest: +SKIP


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [MINOR][DOCS][PYTHON] Fix groupped aggreagte pandas UDF example in df.groupby.agg

Reply via email to