L. C. Hsieh created SPARK-48921:
-----------------------------------

             Summary: ScalaUDF in subquery should run through analyzer
                 Key: SPARK-48921
                 URL: https://issues.apache.org/jira/browse/SPARK-48921
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.3
            Reporter: L. C. Hsieh


We got a customer issue that a `MergeInto` query on Iceberg table works earlier 
but cannot work after upgrading to Spark 3.4.

The error looks like

```
Caused by: org.apache.spark.SparkRuntimeException: Error while decoding: 
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
nullable on unresolved object
upcast(getcolumnbyordinal(0, StringType), StringType, - root class: 
java.lang.String).toString.
```

The source table of `MergeInto` uses `ScalaUDF`. The error happens when Spark 
invokes the deserializer of input encoder of the `ScalaUDF` and the 
deserializer is not resolved yet.

The encoders of ScalaUDF are resolved by the rule `ResolveEncodersInUDF` which 
will be applied at the end of analysis phase.

During rewriting `MergeInto` to `ReplaceData` query, Spark creates an `Exists` 
subquery and `ScalaUDF` is part of the plan of the subquery. Note that the 
`ScalaUDF` is already resolved by the analyzer.

Then, in `ResolveSubquery` rule which resolves the subquery, it will resolve 
the subquery plan if it is not resolved yet. Because the subquery containing 
`ScalaUDF` is resolved, the rule skips it so `ResolveEncodersInUDF` won't be 
applied on it. So the analyzed `ReplaceData` query contains a `ScalaUDF` with 
encoders unresolved that cause the error.






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to