mbutrovich commented on code in PR #4557:
URL: https://github.com/apache/datafusion-comet/pull/4557#discussion_r3349573678
##########
spark/src/main/scala/org/apache/comet/codegen/CometBatchKernelCodegen.scala:
##########
@@ -401,10 +401,15 @@ object CometBatchKernelCodegen extends Logging with
CometExprTraitShim {
* True iff every node in the tree propagates nulls (`NullIntolerant`,
`BoundReference`, or
* `Literal`). Gates the [[defaultBody]] short-circuit, which is only
correct when no node
* (`Coalesce`, `If`, `CaseWhen`, `Concat`, ...) breaks the propagation
chain.
+ *
+ * `TryEval` is rejected: it is `NullIntolerant` (null input -> null
output), but its
+ * `doGenCode` also yields null on non-null input when the child throws. The
short-circuit
+ * assumes non-null inputs imply non-null output and would feed a null
`ev.value` to the writer.
*/
private def allNullIntolerant(expr: Expression): Boolean =
!expr.exists {
case _: BoundReference | _: Literal => false
+ case _: TryEval => true
Review Comment:
This and #4579 are fixing the same latent bug in the codegen dispatcher's
NullIntolerant short-circuit. Both expose the case where an expression reports
`nullIntolerant=true` but can still emit null on non-null inputs: `TryEval`
here via the caught exception, `MakeTimestamp(failOnError=false)` in #4579 via
the caught `DateTimeException`. In both the short-circuit's else branch wrote
`ev.value` without re-checking `ev.isNull`.
I just approved #4579. Its fix is the general one: it keeps the null-input
fast skip and re-adds the post-eval `if (ev.isNull) setNull else write` guard
for any nullable NullIntolerant root, so it already covers `try_aes_decrypt`
(the `TryEval` root is nullable, and `TryEval.doGenCode` sets `ev.isNull=true`
on the caught exception, so the guarded write skips the binary writer and the
NPE never happens).
Once #4579 lands, the `allNullIntolerant`-rejects-`TryEval` change in
`CometBatchKernelCodegen.scala` here is redundant. Worse, it pushes `TryEval`
trees out of the now-correct short-circuit onto the default path, which loses
the null-input fast skip for no correctness benefit. Could you rebase on #4579
and drop the `CometBatchKernelCodegen.scala` edit, keeping the serde additions
(`statics.scala`, `QueryPlanSerde.scala`) and the SQL fixtures? That leaves the
two PRs from conflicting in the merge queue.
One thing I checked: the only AES path that produces null on non-null input
is the `TryEval` one, which is nullable, so #4579's guard engages. The non-try
`aes_decrypt` / `aes_encrypt` throw on bad input rather than returning null,
matching Spark, so the non-nullable direct-write path stays correct for those.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]