[PR] fix(reader): fix position delete bugs with row group skipping [iceberg-rust]

via GitHub Thu, 30 Oct 2025 18:13:37 -0700


mbutrovich opened a new pull request, #1806:
URL: https://github.com/apache/iceberg-rust/pull/1806


   ## Which issue does this PR close?
   
   Partially address #1749.
   
   ## What changes are included in this PR?
   
   This PR fixes two related correctness bugs in 
`ArrowReader::build_deletes_row_selection()` where position deletes targeting 
rows in skipped or skipped-to row groups were not being applied correctly.
   
   ### Background: How These Bugs Were Discovered
   
   While running Apache Spark + Apache Iceberg integration tests through 
DataFusion Comet, we discovered that the following tests were failing or 
hanging:
   - org.apache.iceberg.spark.extensions.TestMergeOnReadMerge
   - org.apache.iceberg.spark.extensions.TestMergeOnReadDelete
   - org.apache.iceberg.spark.extensions.TestMergeOnReadUpdate
   
   Investigation revealed that recent work to support Iceberg's file splitting 
feature (via `filter_row_groups_by_byte_range()`) exposed latent bugs in the 
position delete logic. While the byte range filtering code itself is correct, 
it exercises code paths that were previously untested, revealing these 
pre-existing issues.
   
   #### Bug 1: Missing base index increment when skipping row groups
   
   **The Issue:**
   
   When processing a Parquet file with multiple row groups, if a position 
delete targets a row in a later row group, the function would skip row groups 
without deletes but fail to increment `current_row_group_base_idx`. This caused 
the row index tracking to become desynchronized.
   
   **Example scenario:**
   - File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1)
   - Position delete targets row 199 (last row in group 1)
   - When processing group 0: delete (199) is beyond the group's range, so code 
hits `continue` at lines 469-471
   - BUG: `current_row_group_base_idx` is NOT incremented, stays at 0
   - When processing group 1: code thinks rows start at 0 instead of 100
   - Delete at position 199 is never applied (thinks file only has rows 0-99)
   
   **The Fix:**
   
   Add `current_row_group_base_idx += row_group_num_rows` before the two 
`continue` statements at lines ~470 and ~481. This ensures row index tracking 
stays synchronized when skipping row groups.
   
   #### Bug 2: Stale cached delete index when skipping unselected row groups
   
   **The Issue:**
   
   When row group selection is active (e.g., via byte range filtering for file 
splits) and an unselected row group is skipped, the cached
   `next_deleted_row_idx_opt` variable can become stale, leading to either lost 
deletes or infinite loops depending on the scenario.
   
   The function maintains a cached value (`next_deleted_row_idx_opt`) 
containing the next delete to apply. When skipping unselected row groups, it 
calls `delete_vector_iter.advance_to(next_row_group_base_idx)` to position the 
iterator, but this doesn't automatically update the cached variable.
   
   **Two problematic scenarios:**
   
   1. Stale cache causes infinite loop (the bug we hit):
     - File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1)
     - Position delete at row 0 (in group 0)
     - Row group selection: read ONLY group 1
     - Initial state: `next_deleted_row_idx_opt = Some(0)` (cached)
     - Skip group 0: `advance_to(100)` positions iterator past delete at 0
     - BUG: cached value still `Some(0)` - STALE!
     - Process group 1: loop condition `0 < 200` is `true`, but `current_idx 
(100) != next_deleted_row_idx (0)`, so neither branch executes could result in 
infinite loop
   2. Unconditionally calling `next()` loses deletes:
     - File with 2 row groups: rows 0-99 (group 0) and rows 100-199 (group 1)
     - Position delete at row 199 (in group 1)
     - Row group selection: read ONLY group 1
     - Initial state: `next_deleted_row_idx_opt = Some(199)` (cached, already 
correct!)
     - Skip group 0: `advance_to(100)` - iterator already positioned correctly
     - If we call `next()`: BUG - consumes delete at 199, advancing past it
     - Process group 1: iterator exhausted, delete is lost
   
   **The Fix:**
   
   - If `cached value < next_row_group_base_idx` (stale), update it, thus 
avoiding infinite loop
   - If `cached value >= next_row_group_base_idx` (still valid), keep it, thus 
preserving delete
   
   ## Are these changes tested?
   
   Yes. This PR adds two comprehensive unit tests in reader.rs:
   
   1. `test_position_delete_across_multiple_row_groups` - Tests bug 1 (missing 
base index increment)
   2. `test_position_delete_with_row_group_selection` - Tests bug 2 scenario 
where delete is in selected group
   3. `test_position_delete_in_skipped_row_group` - Tests bug 2 scenario where 
delete is in skipped group (would hang without fix)
   
   Additionally, these fixes resolve failures in Iceberg Java's spark-extension 
tests when running with DataFusion Comet’s PR 
https://github.com/apache/datafusion-comet/pull/2528:
   - org.apache.iceberg.spark.extensions.TestMergeOnReadMerge
   - org.apache.iceberg.spark.extensions.TestMergeOnReadDelete
   - org.apache.iceberg.spark.extensions.TestMergeOnReadUpdate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix(reader): fix position delete bugs with row group skipping [iceberg-rust]

Reply via email to