mbutrovich commented on code in PR #4557:
URL: https://github.com/apache/datafusion-comet/pull/4557#discussion_r3349573678


##########
spark/src/main/scala/org/apache/comet/codegen/CometBatchKernelCodegen.scala:
##########
@@ -401,10 +401,15 @@ object CometBatchKernelCodegen extends Logging with 
CometExprTraitShim {
    * True iff every node in the tree propagates nulls (`NullIntolerant`, 
`BoundReference`, or
    * `Literal`). Gates the [[defaultBody]] short-circuit, which is only 
correct when no node
    * (`Coalesce`, `If`, `CaseWhen`, `Concat`, ...) breaks the propagation 
chain.
+   *
+   * `TryEval` is rejected: it is `NullIntolerant` (null input -> null 
output), but its
+   * `doGenCode` also yields null on non-null input when the child throws. The 
short-circuit
+   * assumes non-null inputs imply non-null output and would feed a null 
`ev.value` to the writer.
    */
   private def allNullIntolerant(expr: Expression): Boolean =
     !expr.exists {
       case _: BoundReference | _: Literal => false
+      case _: TryEval => true

Review Comment:
   This and #4579 are fixing the same latent bug in the codegen dispatcher's 
NullIntolerant short-circuit. Both expose the case where an expression reports 
`nullIntolerant=true` but can still emit null on non-null inputs: `TryEval` 
here via the caught exception, `MakeTimestamp(failOnError=false)` in #4579 via 
the caught `DateTimeException`. In both the short-circuit's else branch wrote 
`ev.value` without re-checking `ev.isNull`.
   
   I just approved #4579. Its fix is the general one: it keeps the null-input 
fast skip and re-adds the post-eval `if (ev.isNull) setNull else write` guard 
for any nullable NullIntolerant root, so it already covers `try_aes_decrypt` 
(the `TryEval` root is nullable, and `TryEval.doGenCode` sets `ev.isNull=true` 
on the caught exception, so the guarded write skips the binary writer and the 
NPE never happens).
   
   Once #4579 lands, the `allNullIntolerant`-rejects-`TryEval` change in 
`CometBatchKernelCodegen.scala` here is redundant. Worse, it pushes `TryEval` 
trees out of the now-correct short-circuit onto the default path, which loses 
the null-input fast skip for no correctness benefit. Could you rebase on #4579 
and drop the `CometBatchKernelCodegen.scala` edit, keeping the serde additions 
(`statics.scala`, `QueryPlanSerde.scala`) and the SQL fixtures? That keeps the 
two PRs from conflicting on `CometBatchKernelCodegen.scala`.
   
   One thing I checked: the only AES path that produces null on non-null input 
is the `TryEval` one, which is nullable, so #4579's guard engages. The non-try 
`aes_decrypt` / `aes_encrypt` throw on bad input rather than returning null, 
matching Spark, so the non-nullable direct-write path stays correct for those.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to