Tian Gao created SPARK-54362:
--------------------------------

             Summary: Infer returnType from type annotation for Python UDF
                 Key: SPARK-54362
                 URL: https://issues.apache.org/jira/browse/SPARK-54362
             Project: Spark
          Issue Type: New Feature
          Components: PySpark
    Affects Versions: 4.2.0
            Reporter: Tian Gao


We should be able to infer returnType from the function definitions most of the 
time. This will make the life much easier for users.

Simple example would be
{code:java}
@udf(returnType=IntegerType())
def f(x):
    return x + 1

# Could simply be
@udf
def f(x) -> int:
    return x + 1{code}
We can also use tricks for users to define a valid, static type checker 
permitted way to define the size of the int

 

 
{code:java}
# We define this
int64 = typing.Annotated[int, 64]

# Now we convert this to 64-bit integer
@udf
def f(x) -> int64:
    return x + 1 {code}
 

 

Furthermore, we should also allow users to return a valid Python object and 
infer schema from it (a little bit like Pydantic)

 
{code:java}
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), False)
])

@udf(schema)
def get_person():
    return (name, age)

# could be

class MyReturnClass:
    name: str
    age: Optional[int]

@udf
def get_person() -> MyReturnClass:
    return MyReturnClass(name=name, age=age){code}
We can basically hide all our spark specific type magics behind the typing 
system Python users are already familiar with.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to