dojiong opened a new pull request, #1915:
URL: https://github.com/apache/iceberg-rust/pull/1915
## Which issue does this PR close?
- Closes #.
## What changes are included in this PR?
A stack overflow occurs when processing data files containing a large number
of equality deletes (e.g., > 6000 rows).
This happens because parse_equality_deletes_record_batch_stream previously
constructed the final predicate by linearly calling .and() in a loop:
```rust
result_predicate = result_predicate.and(row_predicate.not());
```
This resulted in a deeply nested, left-skewed tree structure with a depth
equal to the number of rows (N). When rewrite_not() (which uses a recursive
visitor
pattern) was subsequently called on this structure, or when the structure
was dropped, the call stack limit was exceeded.
Changes
1. Balanced Tree Construction: Refactored the predicate combination
logic. Instead of linear accumulation, row predicates are collected and
combined using a
pairwise combination approach to build a balanced tree. This reduces
the tree depth from O(N) to O(log N).
2. Early Rewrite: rewrite_not() is now called immediately on each
individual row predicate before they are combined. This ensures we are
combining simplified
predicates and avoids traversing a massive unoptimized tree later.
3. Regression Test: Added
test_large_equality_delete_batch_stack_overflow, which processes 20,000
equality delete rows to verify the fix.
## Are these changes tested?
- [x] New regression test test_large_equality_delete_batch_stack_overflow
passed.
- [x] All existing tests in arrow::caching_delete_file_loader passed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]