[GitHub] [iceberg] rdblue commented on a diff in pull request #6258: Python: Implement PyArrow row level filtering

GitBox Sat, 26 Nov 2022 14:10:41 -0800


rdblue commented on code in PR #6258:
URL: https://github.com/apache/iceberg/pull/6258#discussion_r1032837927



##########
python/pyiceberg/table/__init__.py:
##########
@@ -355,7 +355,23 @@ def to_arrow(self):
         if "*" not in self.selected_fields:
             columns = list(self.selected_fields)
 
-        return pq.read_table(source=locations, filesystem=fs, columns=columns)
+        pyarrow_filter = None
+        if self.row_filter is not AlwaysTrue():
+            bound_row_filter = bind(self.table.schema(), self.row_filter)
+            pyarrow_filter = expression_to_pyarrow(bound_row_filter)
+
+        from pyarrow.dataset import dataset
+
+        ds = dataset(
+            source=locations,
+            filesystem=fs,
+            # Optionally provide the Schema for the Dataset,
+            # in which case it will not be inferred from the source.
+            # 
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.dataset.html#pyarrow.dataset.dataset
+            schema=schema_to_pyarrow(self.table.schema()),
+        )
+
+        return ds.to_table(filter=pyarrow_filter, columns=columns)

Review Comment:
   Dataset seems good to me if you can read it in chunks.
   
   We may also need to refactor this and handle files individually when we 
implement correct projection, so this will probably change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #6258: Python: Implement PyArrow row level filtering

Reply via email to