Fokko commented on code in PR #6258:
URL: https://github.com/apache/iceberg/pull/6258#discussion_r1033509831
##########
python/pyiceberg/table/__init__.py:
##########
@@ -355,7 +355,23 @@ def to_arrow(self):
if "*" not in self.selected_fields:
columns = list(self.selected_fields)
- return pq.read_table(source=locations, filesystem=fs, columns=columns)
+ pyarrow_filter = None
+ if self.row_filter is not AlwaysTrue():
+ bound_row_filter = bind(self.table.schema(), self.row_filter)
+ pyarrow_filter = expression_to_pyarrow(bound_row_filter)
+
+ from pyarrow.dataset import dataset
+
+ ds = dataset(
+ source=locations,
+ filesystem=fs,
+ # Optionally provide the Schema for the Dataset,
+ # in which case it will not be inferred from the source.
+ #
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.dataset.html#pyarrow.dataset.dataset
+ schema=schema_to_pyarrow(self.table.schema()),
Review Comment:
It should be equal to or a subset of the original schema, see the example in
the PR description
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]