Fokko commented on code in PR #6775:
URL: https://github.com/apache/iceberg/pull/6775#discussion_r1190314284


##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -721,18 +773,38 @@ def _file_to_table(
         fragment_scanner = ds.Scanner.from_fragment(
             fragment=fragment,
             schema=physical_schema,
-            filter=pyarrow_filter,
+            # This will push down the query to Arrow.
+            # But in case there are positional deletes, we have to apply them 
first
+            filter=pyarrow_filter if not positional_deletes else None,
             columns=[col.name for col in file_project_schema.columns],
         )
 
+        if positional_deletes:
+            # In the case of a mask, it is a bit awkward because we first

Review Comment:
   This if-branch is not in its final form (yet). With positional deletes and a 
user-provided filter, we fetch too many records. See 
https://github.com/apache/arrow/issues/35301



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -721,18 +773,38 @@ def _file_to_table(
         fragment_scanner = ds.Scanner.from_fragment(
             fragment=fragment,
             schema=physical_schema,
-            filter=pyarrow_filter,
+            # This will push down the query to Arrow.
+            # But in case there are positional deletes, we have to apply them 
first
+            filter=pyarrow_filter if not positional_deletes else None,
             columns=[col.name for col in file_project_schema.columns],
         )
 
+        if positional_deletes:
+            # In the case of a mask, it is a bit awkward because we first

Review Comment:
   This if-branch is not in its final form (yet). With positional deletes and a 
user-provided filter, we fetch too many records. See 
https://github.com/apache/arrow/issues/35301 for the details



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to