schenksj opened a new issue, #4526: URL: https://github.com/apache/datafusion-comet/issues/4526
### Problem Comet serializes `And`/`Or` as a left-deep `BinaryExpr` tree (one nesting level per operand). A query with many conjuncts/disjuncts -- e.g. a wide `WHERE a AND b AND c AND ...`, or generated data-skipping predicates -- produces a proto nested deeper than protobuf's default recursion limit (100). The overflow fires when the serialized `Operator` is **re-parsed**: on the JVM via `OperatorOuterClass.Operator.parseFrom` (e.g. `findShuffleScanIndices`, explain) and in the Rust `prost` decoder. The query fails even though Comet can evaluate it. ### Why it's safe to rebalance Comet evaluates `And`/`Or` vectorially -- both sides are always evaluated, with no row-level short-circuit -- so the chain is fully associative. Rebalancing the flattened operands into a depth-`O(log n)` tree is semantically identical; it only changes the proto's shape. ### Proposed fix Add `QueryPlanSerde.flattenAssociative` (flatten an associative chain to its leaves) and `createBalancedBinaryExpr` (rebuild as a balanced tree), and route `CometAnd`/`CometOr` through them. ### Relationship to the Delta integration This is a standalone correctness/robustness fix for any wide boolean predicate. It is **surfaced by** the in-progress Delta Lake contrib integration (Delta data-skipping builds deep conjunctions), so it would help to prioritize it alongside that work. A PR will follow shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
