ndrluis commented on code in PR #2881:
URL: https://github.com/apache/iceberg-python/pull/2881#discussion_r3307479381
##########
pyiceberg/io/pyarrow.py:
##########
@@ -1641,7 +1645,12 @@ def _task_to_record_batches(
bound_row_filter, file_schema, case_sensitive=case_sensitive,
projected_field_values=projected_missing_fields
)
bound_file_filter = bind(file_schema, translated_row_filter,
case_sensitive=case_sensitive)
- pyarrow_filter = expression_to_pyarrow(bound_file_filter,
file_schema)
+ try:
+ pyarrow_filter = expression_to_pyarrow(bound_file_filter,
file_schema)
+ except pyarrow.lib.ArrowNotImplementedError as e:
+ if "arrow.uuid" in str(e):
+ raise
NotImplementedError(UUID_FILTER_NOT_SUPPORTED_ERROR_MESSAGE) from e
+ raise
Review Comment:
I think keeping this at the PyArrow translation boundary makes more sense
here. `Table.scan()` is lazy and only returns a `DataScan`, so the Arrow
expression is not built there and this exception would not be raised from
`scan()` itself.
Moving it to `to_arrow()` / `to_arrow_batch_reader()` would also duplicate
PyArrow-specific handling in the public scan layer. Since this error is
specifically caused by translating the bound filter into a PyArrow expression,
I’d keep the conversion here while still surfacing a user-facing
`NotImplementedError`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]