[
https://issues.apache.org/jira/browse/SPARK-46251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Will Boulter updated SPARK-46251:
---------------------------------
Description:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2,
..)}} correctly handle casting {{null}} into \{{None}} when the target type is
\{{{}an Option.
In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes
through as {{null}} which is likely to cause a {{NullPointerException}} for
most Scala code that operates on the Option. The change seems to be related to
the following commit:
[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
I have made a reproduction with a couple of examples in a public Github repo
here:
[https://github.com/q-willboulter/spark-tuple-encoders-bug]
The common use case where this is likely to be encountered is while doing any
joins that can return null, e.g. left or outer joins. When casting the result
of a left join it is sensible to wrap the right-hand side in an Option to
handle the case where there is no match. Since 3.3.3 this would fail if the
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}
If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at
once using reflection, the encoder works as expected. The bug appears to be in
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
was:
In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1, encoder2,
..)}} correctly handle casting \{{null}} into \{{None }} when the target type
is \{{{}an Option.
In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes
through as {{null}} which is likely to cause a {{NullPointerException}} for
most Scala code that operates on the Option. The change seems to be related to
the following commit:
[https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
I have made a reproduction with a couple of examples in a public Github repo
here:
[https://github.com/q-willboulter/spark-tuple-encoders-bug]
The common use case where this is likely to be encountered is while doing any
joins that can return null, e.g. left or outer joins. When casting the result
of a left join it is sensible to wrap the right-hand side in an Option to
handle the case where there is no match. Since 3.3.3 this would fail if the
encoder is derived manually using {{Encoders.tuple(leftEncoder, rightEncoder).}}
If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at
once using reflection, the encoder works as expected. The bug appears to be in
the following function inside {{ExpressionEncoder.scala}}
{code:java}
def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = ...{code}
> Spark 3.3.3 tuple encoders built using `Encoders.tuple` do not correctly cast
> null into None for Option values
> --------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-46251
> URL: https://issues.apache.org/jira/browse/SPARK-46251
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.3.3, 3.4.2, 3.4.0, 3.4.1, 3.5.0
> Reporter: Will Boulter
> Priority: Major
>
> In Spark {{3.3.2}} encoders created using {{Encoders.tuple(encoder1,
> encoder2, ..)}} correctly handle casting {{null}} into \{{None}} when the
> target type is \{{{}an Option.
> In Spark {{{}3.3.3{}}}, this behaviour has changed and the Option value comes
> through as {{null}} which is likely to cause a {{NullPointerException}} for
> most Scala code that operates on the Option. The change seems to be related
> to the following commit:
> [https://github.com/apache/spark/commit/9110c05d54c392e55693eba4509be37c571d610a]
> I have made a reproduction with a couple of examples in a public Github repo
> here:
> [https://github.com/q-willboulter/spark-tuple-encoders-bug]
> The common use case where this is likely to be encountered is while doing any
> joins that can return null, e.g. left or outer joins. When casting the result
> of a left join it is sensible to wrap the right-hand side in an Option to
> handle the case where there is no match. Since 3.3.3 this would fail if the
> encoder is derived manually using {{Encoders.tuple(leftEncoder,
> rightEncoder).}}
> If the entire tuple encoder {{Encoder[(Left, Option[Right]])}} is derived at
> once using reflection, the encoder works as expected. The bug appears to be
> in the following function inside {{ExpressionEncoder.scala}}
> {code:java}
> def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] =
> ...{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]