GayathriSrividya opened a new pull request, #3448: URL: https://github.com/apache/iceberg-python/pull/3448
Closes #3272 ## What this changes This PR updates the Arrow scan path in `_task_to_record_batches` to avoid redundant filtering when there are no positional deletes. - Keeps predicate pushdown in `Scanner.from_fragment` as the only filter path when `positional_deletes` is absent. - Applies `current_batch.filter(pyarrow_filter)` only in the positional-delete path, after deletes are applied. - Preserves empty-batch handling after both delete application and conditional filtering. ## Why The previous flow could perform an extra table-level refilter even when the scanner already applied the predicate. This change removes that stale workaround path while keeping correct behavior for positional delete scenarios. ## Tests Added regression coverage in `tests/io/test_pyarrow.py`: - `test_task_to_record_batches_filter_without_positional_deletes_avoids_table_refilter` - `test_task_to_record_batches_filter_with_positional_deletes_handles_empty_batch` Validated locally: - `python -m pytest tests/io/test_pyarrow.py -q -k "task_to_record_batches_nanos or filter_without_positional_deletes_avoids_table_refilter or filter_with_positional_deletes_handles_empty_batch"` - `make lint` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
