GayathriSrividya commented on code in PR #3448:
URL: https://github.com/apache/iceberg-python/pull/3448#discussion_r3371840170


##########
pyiceberg/io/pyarrow.py:
##########
@@ -1676,22 +1676,13 @@ def _task_to_record_batches(
                 # Create the mask of indices that we're interested in
                 indices = _combine_positional_deletes(positional_deletes, 
current_index, current_index + len(batch))
                 current_batch = current_batch.take(indices)
+                if pyarrow_filter is not None:
+                    current_batch = current_batch.filter(pyarrow_filter)

Review Comment:
   Thanks for the suggestion! We actually already have that exact regression 
test in the PR 
(test_task_to_record_batches_filter_after_positional_deletes_empty_result). 
You're right that the len(current_batch) > 0 guard wasn't sufficient — it only 
skips filtering when the batch is already empty before filtering, but doesn't 
protect against a non-empty batch filtering down to zero rows (which is the 
actual PyArrow <21 bug). I've switched back to the pa.Table.from_batches 
workaround as you suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to