Tian Gao created SPARK-54362:
--------------------------------
Summary: Infer returnType from type annotation for Python UDF
Key: SPARK-54362
URL: https://issues.apache.org/jira/browse/SPARK-54362
Project: Spark
Issue Type: New Feature
Components: PySpark
Affects Versions: 4.2.0
Reporter: Tian Gao
We should be able to infer returnType from the function definitions most of the
time. This will make the life much easier for users.
Simple example would be
{code:java}
@udf(returnType=IntegerType())
def f(x):
return x + 1
# Could simply be
@udf
def f(x) -> int:
return x + 1{code}
We can also use tricks for users to define a valid, static type checker
permitted way to define the size of the int
{code:java}
# We define this
int64 = typing.Annotated[int, 64]
# Now we convert this to 64-bit integer
@udf
def f(x) -> int64:
return x + 1 {code}
Furthermore, we should also allow users to return a valid Python object and
infer schema from it (a little bit like Pydantic)
{code:java}
schema = StructType([
StructField("name", StringType(), True),
StructField("age", IntegerType(), False)
])
@udf(schema)
def get_person():
return (name, age)
# could be
class MyReturnClass:
name: str
age: Optional[int]
@udf
def get_person() -> MyReturnClass:
return MyReturnClass(name=name, age=age){code}
We can basically hide all our spark specific type magics behind the typing
system Python users are already familiar with.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]