schenksj opened a new issue, #4526:
URL: https://github.com/apache/datafusion-comet/issues/4526

   ### Problem
   
   Comet serializes `And`/`Or` as a left-deep `BinaryExpr` tree (one nesting 
level per operand). A query with many conjuncts/disjuncts -- e.g. a wide `WHERE 
a AND b AND c AND ...`, or generated data-skipping predicates -- produces a 
proto nested deeper than protobuf's default recursion limit (100). The overflow 
fires when the serialized `Operator` is **re-parsed**: on the JVM via 
`OperatorOuterClass.Operator.parseFrom` (e.g. `findShuffleScanIndices`, 
explain) and in the Rust `prost` decoder. The query fails even though Comet can 
evaluate it.
   
   ### Why it's safe to rebalance
   
   Comet evaluates `And`/`Or` vectorially -- both sides are always evaluated, 
with no row-level short-circuit -- so the chain is fully associative. 
Rebalancing the flattened operands into a depth-`O(log n)` tree is semantically 
identical; it only changes the proto's shape.
   
   ### Proposed fix
   
   Add `QueryPlanSerde.flattenAssociative` (flatten an associative chain to its 
leaves) and `createBalancedBinaryExpr` (rebuild as a balanced tree), and route 
`CometAnd`/`CometOr` through them.
   
   ### Relationship to the Delta integration
   
   This is a standalone correctness/robustness fix for any wide boolean 
predicate. It is **surfaced by** the in-progress Delta Lake contrib integration 
(Delta data-skipping builds deep conjunctions), so it would help to prioritize 
it alongside that work. A PR will follow shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to