schenksj opened a new pull request, #4531:
URL: https://github.com/apache/datafusion-comet/pull/4531

   ## Which issue does this PR close?
   
   Closes #4526.
   
   ## Rationale for this change
   
   A left-deep chain of N associative boolean operands serializes to a proto 
nested N levels deep. With N greater than protobuf's default recursion limit 
(100), the message overflows when the serialized plan is re-parsed -- on the 
JVM via `OperatorOuterClass.Operator.parseFrom` (e.g. `findShuffleScanIndices` 
/ explain) and in the Rust `prost` decoder -- so an otherwise-supported query 
fails.
   
   Comet evaluates `And`/`Or` vectorially (both sides always evaluated, no 
row-level short-circuit), so the chains are fully associative and safe to 
rebalance.
   
   This is a standalone fix; it was surfaced while working on the Delta Lake 
contrib integration (Delta data-skipping builds deep conjunctions), so 
prioritizing it helps that effort, but it applies to any wide boolean predicate.
   
   ## What changes are included in this PR?
   
   - `QueryPlanSerde.flattenAssociative` flattens an associative `And`/`Or` 
chain into its leaf operands.
   - `QueryPlanSerde.createBalancedBinaryExpr` rebuilds the operands as a 
balanced `O(log n)`-depth `BinaryExpr` tree.
   - `CometAnd` / `CometOr` are routed through these instead of the left-deep 
`createBinaryExpr`.
   
   The rebalancing is semantically identical -- it only changes the proto's 
shape.
   
   ## How are these changes tested?
   
   New test in `CometExpressionSuite`: projects a 200-deep AND chain and a 
200-deep OR chain (distinct literals; `>`/`<` so neither `CombineFilters` nor 
`OptimizeIn` collapses them) and asserts Comet executes them natively with 
correct results. The test fails on `main` with `InvalidProtocolBufferException: 
Protocol message had too many levels of nesting` and passes with this change. 
Full `CometExpressionSuite` passes (124/0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to