[I] CometProject and CometHashAggregate do not perform cross-sibling subexpression elimination over ScalaUDF [datafusion-comet]

via GitHub Fri, 29 May 2026 16:22:18 -0700


andygrove opened a new issue, #4516:
URL: https://github.com/apache/datafusion-comet/issues/4516


   When a `ScalaUDF` is dispatched into the native plan via the JVM Scala UDF 
codegen dispatcher (enabled by default in #4514), Comet's `CometProject` and 
`CometHashAggregate` do not implement Spark's cross-sibling common 
subexpression elimination over `ScalaUDF`. An expression such as `sum(udf(b) + 
udf(b) + udf(b))` therefore invokes the UDF body once per reference instead of 
once.
   
   This is observable in Spark's `SQLQuerySuite` "Common subexpression 
elimination" test: the call count for the aggregate case is 3 under Comet 
versus 1 in vanilla Spark. The query result is unchanged; only the number of 
UDF invocations differs.
   
   Follow-on from #4514. We should extend the cross-sibling CSE that 
`CometProject` performs to the aggregate operator's input projection (and any 
other operator that builds an input projection) so that repeated `ScalaUDF` 
references are evaluated once.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] CometProject and CometHashAggregate do not perform cross-sibling subexpression elimination over ScalaUDF [datafusion-comet]

Reply via email to