[
https://issues.apache.org/jira/browse/SPARK-54362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038475#comment-18038475
]
Tian Gao commented on SPARK-54362:
----------------------------------
[~podongfeng] I know you are working on some related tasks, what's your
thoughts?
Also tag [~gurwls223]
> Infer returnType from type annotation for Python UDF
> ----------------------------------------------------
>
> Key: SPARK-54362
> URL: https://issues.apache.org/jira/browse/SPARK-54362
> Project: Spark
> Issue Type: New Feature
> Components: PySpark
> Affects Versions: 4.2.0
> Reporter: Tian Gao
> Priority: Major
>
> We should be able to infer returnType from the function definitions most of
> the time. This will make the life much easier for users.
> Simple example would be
> {code:java}
> @udf(returnType=IntegerType())
> def f(x):
> return x + 1
> # Could simply be
> @udf
> def f(x) -> int:
> return x + 1{code}
> We can also use tricks for users to define a valid, static type checker
> permitted way to define the size of the int
>
>
> {code:java}
> # We define this
> int64 = typing.Annotated[int, 64]
> # Now we convert this to 64-bit integer
> @udf
> def f(x) -> int64:
> return x + 1 {code}
>
>
> Furthermore, we should also allow users to return a valid Python object and
> infer schema from it (a little bit like Pydantic)
>
> {code:java}
> schema = StructType([
> StructField("name", StringType(), True),
> StructField("age", IntegerType(), False)
> ])
> @udf(schema)
> def get_person():
> return (name, age)
> # could be
> class MyReturnClass:
> name: str
> age: Optional[int]
> @udf
> def get_person() -> MyReturnClass:
> return MyReturnClass(name=name, age=age){code}
> We can basically hide all our spark specific type magics behind the typing
> system Python users are already familiar with.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]