[ 
https://issues.apache.org/jira/browse/SPARK-54362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038475#comment-18038475
 ] 

Tian Gao commented on SPARK-54362:
----------------------------------

[~podongfeng] I know you are working on some related tasks, what's your 
thoughts?

Also tag [~gurwls223] 

> Infer returnType from type annotation for Python UDF
> ----------------------------------------------------
>
>                 Key: SPARK-54362
>                 URL: https://issues.apache.org/jira/browse/SPARK-54362
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 4.2.0
>            Reporter: Tian Gao
>            Priority: Major
>
> We should be able to infer returnType from the function definitions most of 
> the time. This will make the life much easier for users.
> Simple example would be
> {code:java}
> @udf(returnType=IntegerType())
> def f(x):
>     return x + 1
> # Could simply be
> @udf
> def f(x) -> int:
>     return x + 1{code}
> We can also use tricks for users to define a valid, static type checker 
> permitted way to define the size of the int
>  
>  
> {code:java}
> # We define this
> int64 = typing.Annotated[int, 64]
> # Now we convert this to 64-bit integer
> @udf
> def f(x) -> int64:
>     return x + 1 {code}
>  
>  
> Furthermore, we should also allow users to return a valid Python object and 
> infer schema from it (a little bit like Pydantic)
>  
> {code:java}
> schema = StructType([
>     StructField("name", StringType(), True),
>     StructField("age", IntegerType(), False)
> ])
> @udf(schema)
> def get_person():
>     return (name, age)
> # could be
> class MyReturnClass:
>     name: str
>     age: Optional[int]
> @udf
> def get_person() -> MyReturnClass:
>     return MyReturnClass(name=name, age=age){code}
> We can basically hide all our spark specific type magics behind the typing 
> system Python users are already familiar with.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to