This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 6dd5c5fb5754 [MINOR][DOCS][PYTHON] Fix groupped aggreagte pandas UDF
example in df.groupby.agg
6dd5c5fb5754 is described below
commit 6dd5c5fb5754c77b4686f6e1b60759c8ffdfc871
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Nov 11 08:18:07 2024 -0800
[MINOR][DOCS][PYTHON] Fix groupped aggreagte pandas UDF example in
df.groupby.agg
### What changes were proposed in this pull request?
This PR proposes to fix the groupped aggreagte pandas UDF example in
`df.groupby.agg` by using type hints.
### Why are the changes needed?
To avoid encoraging users to use the old style.
### Does this PR introduce _any_ user-facing change?
Yes, it fixes the user-facing documentation
### How was this patch tested?
Manually ran the example.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #48809 from HyukjinKwon/minor-fix-docstring.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
python/pyspark/sql/group.py | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/python/pyspark/sql/group.py b/python/pyspark/sql/group.py
index 94b4b64a0b6f..2e6941e48541 100644
--- a/python/pyspark/sql/group.py
+++ b/python/pyspark/sql/group.py
@@ -126,8 +126,9 @@ class GroupedData(PandasGroupedOpsMixin):
Examples
--------
+ >>> import pandas as pd # doctest: +SKIP
>>> from pyspark.sql import functions as sf
- >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
+ >>> from pyspark.sql.functions import pandas_udf
>>> df = spark.createDataFrame(
... [(2, "Alice"), (3, "Alice"), (5, "Bob"), (10, "Bob")],
["age", "name"])
>>> df.show()
@@ -165,8 +166,8 @@ class GroupedData(PandasGroupedOpsMixin):
Same as above but uses pandas UDF.
- >>> @pandas_udf('int', PandasUDFType.GROUPED_AGG) # doctest: +SKIP
- ... def min_udf(v):
+ >>> @pandas_udf('int') # doctest: +SKIP
+ ... def min_udf(v: pd.Series) -> int:
... return v.min()
...
>>> df.groupBy(df.name).agg(min_udf(df.age)).sort("name").show() #
doctest: +SKIP
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]