jorisvandenbossche commented on code in PR #14011:
URL: https://github.com/apache/arrow/pull/14011#discussion_r962607329
##########
python/pyarrow/parquet/core.py:
##########
@@ -2338,9 +2340,10 @@ def __init__(self, path_or_paths, filesystem=None, *,
filters=None,
if decryption_properties is not None:
read_options.update(decryption_properties=decryption_properties)
- # map filters to Expressions
- self._filters = filters
- self._filter_expression = filters and _filters_to_expression(filters)
+ # map filters to Expressions if not already
+ if not isinstance(filters, ds.Expression):
Review Comment:
The `_filters_to_expression` function has this check already as well, so
this shouldn't be needed?
##########
python/pyarrow/parquet/core.py:
##########
@@ -89,7 +89,9 @@ def _check_filters(filters, check_null_strings=True):
Check if filters are well-formed.
"""
if filters is not None:
- if len(filters) == 0 or any(len(f) == 0 for f in filters):
+ if not hasattr(filters, '__len__') \
Review Comment:
I suppose this extra check is for giving an error message when passing an
Expression to the old implementation (when using `use_legacy_dataset=True`) ?
Maybe we can make this a separate check with a more informative error
message (like "Expressions as filter not supported for legacy dataset reader").
##########
python/pyarrow/parquet/core.py:
##########
@@ -89,7 +89,9 @@ def _check_filters(filters, check_null_strings=True):
Check if filters are well-formed.
"""
if filters is not None:
- if len(filters) == 0 or any(len(f) == 0 for f in filters):
+ if not hasattr(filters, '__len__') \
Review Comment:
(also wondering if we have a more exact duck type check, something like
`hasattr(filters, "cast")`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]