L. C. Hsieh created SPARK-48921:
-----------------------------------
Summary: ScalaUDF in subquery should run through analyzer
Key: SPARK-48921
URL: https://issues.apache.org/jira/browse/SPARK-48921
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.4.3
Reporter: L. C. Hsieh
We got a customer issue that a `MergeInto` query on Iceberg table works earlier
but cannot work after upgrading to Spark 3.4.
The error looks like
```
Caused by: org.apache.spark.SparkRuntimeException: Error while decoding:
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to
nullable on unresolved object
upcast(getcolumnbyordinal(0, StringType), StringType, - root class:
java.lang.String).toString.
```
The source table of `MergeInto` uses `ScalaUDF`. The error happens when Spark
invokes the deserializer of input encoder of the `ScalaUDF` and the
deserializer is not resolved yet.
The encoders of ScalaUDF are resolved by the rule `ResolveEncodersInUDF` which
will be applied at the end of analysis phase.
During rewriting `MergeInto` to `ReplaceData` query, Spark creates an `Exists`
subquery and `ScalaUDF` is part of the plan of the subquery. Note that the
`ScalaUDF` is already resolved by the analyzer.
Then, in `ResolveSubquery` rule which resolves the subquery, it will resolve
the subquery plan if it is not resolved yet. Because the subquery containing
`ScalaUDF` is resolved, the rule skips it so `ResolveEncodersInUDF` won't be
applied on it. So the analyzed `ReplaceData` query contains a `ScalaUDF` with
encoders unresolved that cause the error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]