This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 19e3de44ec1d [MINOR][PYTHON][DOCS] Fix a `pandas_udf` example
19e3de44ec1d is described below

commit 19e3de44ec1d83e1cd46e8709d1767dda7d655bb
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Mon Mar 25 08:43:17 2024 +0900

    [MINOR][PYTHON][DOCS] Fix a `pandas_udf` example
    
    ### What changes were proposed in this pull request?
    Fix a `pandas_udf` example
    
    ### Why are the changes needed?
    checked in both spark connect and vanilla spark, the return type is a 
struct type other than array type
    
    ```
    In [33]:         >>> pandas_udf("first string, last string")
        ...:         ... def split_expand(s: pd.Series) -> pd.DataFrame:
        ...:         ...     return s.str.split(expand=True)
        ...:         ...
        ...:         >>> df = spark.createDataFrame([("John Doe",)], ("name",))
        ...:         >>> df.select(split_expand("name")).show()
    +------------------+
    |split_expand(name)|
    +------------------+
    |       {John, Doe}|
    +------------------+
    
    In [34]: df.select(split_expand("name")).printSchema()
    root
     |-- split_expand(name): struct (nullable = true)
     |    |-- first: string (nullable = true)
     |    |-- last: string (nullable = true)
    ```
    
    other pandas udf examples in this file are fine
    
    ### Does this PR introduce _any_ user-facing change?
    yes, doc changes
    
    ### How was this patch tested?
    manually check
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #45665 from zhengruifeng/nit_pandas_udf_example.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/sql/pandas/functions.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/sql/pandas/functions.py 
b/python/pyspark/sql/pandas/functions.py
index dc6fb5a8976d..3ca4a8743d0d 100644
--- a/python/pyspark/sql/pandas/functions.py
+++ b/python/pyspark/sql/pandas/functions.py
@@ -153,7 +153,7 @@ def pandas_udf(f=None, returnType=None, functionType=None):
         +------------------+
         |split_expand(name)|
         +------------------+
-        |       [John, Doe]|
+        |       {John, Doe}|
         +------------------+
 
         This type of Pandas UDF can use keyword arguments:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to