Will Boulter created SPARK-46251:
------------------------------------

             Summary: Spark 3.3.3 tuple encoders do not correctly casting null 
into None for Option values
                 Key: SPARK-46251
                 URL: https://issues.apache.org/jira/browse/SPARK-46251
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0, 3.4.1, 3.4.0, 3.4.2, 3.3.3
            Reporter: Will Boulter


In Spark `3.3.2`, encoders created using `Encoders.tuple(encoder1, encoder2, 
..)` correctly handle casting `null` into `None` when the target type is an 
`Option`. 

 

In Spark `3.3.3`, this behaviour has changed and the Option value comes through 
as `null` which is likely to cause a `NullPointerException` for most Scala code 
that operates on the Option. The change seems to be related to the following 
commit:

[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]

 

I have made a reproduction with a couple of examples in a public Github repo 
here:

[https://github.com/q-willboulter/spark-tuple-encoders-bug] 

 

The common use case where this is likely to be encountered is while doing any 
joins that can return null, e.g. left or outer joins. When casting the result 
of a left join it is sensible to wrap the right-hand side in an Option to 
handle the case where there is no match - since 3.3.3 this could fail if the 
encoder is derived manually using `Encoders.tuple(leftEncoder, rightEncoder)`. 
If the entire tuple encoder `Encoder[(Left, Option[Right]])` is derived at 
once, the encoder works as expected - the bug appears to be in the following 
function inside `ExpressionEncoder.scala`

```
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to