[ https://issues.apache.org/jira/browse/SPARK-52821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-52821. ---------------------------------- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 51538 [https://github.com/apache/spark/pull/51538] > Support int to DecimalType return type coercion in Pandas UDFs (useArrow=True) > ------------------------------------------------------------------------------ > > Key: SPARK-52821 > URL: https://issues.apache.org/jira/browse/SPARK-52821 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 4.1.0 > Reporter: Ben Hurdelhey > Assignee: Ben Hurdelhey > Priority: Minor > Labels: pull-request-available > Fix For: 4.1.0 > > Attachments: Screenshot 2025-07-16 at 11.49.31.png > > > Problem: pyspark UDFs with useArrow=True do not support type coercion from > int to DecimalType if the target precision of the DecimalType is too low. > Example: > {code:java} > @udf(returnType=DecimalType(2, 1), useArrow=True) > def test: > return 1 > spark.range(1,2,1,1).select(test(col('id'))).display() # expected: (Decimal) > 1.0 > {code} > throws > {code:java} > pyarrow.lib.ArrowInvalid: Precision is not great enough for the result. It > should be at least 20{code} > > For a better overview of the current behavior, check out this publicly > available > [notebook|https://www.databricks.com/wp-content/uploads/notebooks/python-udf-type-coercion.html], > with the proposed change highlighted in the screenshot. > > Proposed solution: Add integer to decimal conversion for pyspark udf return > types. This is a net-new use case, it was not supported previously (threw an > error). Thus, this is not a breaking change. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org