The GitHub Actions job "Nightly PyPI Build" on iceberg-rust.git/main has failed. Run started by GitHub user Fokko (triggered by Fokko).
Head commit for run: 16906c127d521395a789a9019350e467cc34d063 / Lo <[email protected]> fix: stack overflow when loading large equality deletes (#1915) ## Which issue does this PR close? - Closes #. ## What changes are included in this PR? A stack overflow occurs when processing data files containing a large number of equality deletes (e.g., > 6000 rows). This happens because parse_equality_deletes_record_batch_stream previously constructed the final predicate by linearly calling .and() in a loop: ```rust result_predicate = result_predicate.and(row_predicate.not()); ``` This resulted in a deeply nested, left-skewed tree structure with a depth equal to the number of rows (N). When rewrite_not() (which uses a recursive visitor pattern) was subsequently called on this structure, or when the structure was dropped, the call stack limit was exceeded. Changes 1. Balanced Tree Construction: Refactored the predicate combination logic. Instead of linear accumulation, row predicates are collected and combined using a pairwise combination approach to build a balanced tree. This reduces the tree depth from O(N) to O(log N). 2. Early Rewrite: rewrite_not() is now called immediately on each individual row predicate before they are combined. This ensures we are combining simplified predicates and avoids traversing a massive unoptimized tree later. 3. Regression Test: Added test_large_equality_delete_batch_stack_overflow, which processes 20,000 equality delete rows to verify the fix. ## Are these changes tested? - [x] New regression test test_large_equality_delete_batch_stack_overflow passed. - [x] All existing tests in arrow::caching_delete_file_loader passed. Co-authored-by: Renjie Liu <[email protected]> Report URL: https://github.com/apache/iceberg-rust/actions/runs/20216205951 With regards, GitHub Actions via GitBox
