Alenka Frim created ARROW-16199:
-----------------------------------
Summary: [Python] Filters and pq.ParquetDataset/pq.read_table with
legacy API
Key: ARROW-16199
URL: https://issues.apache.org/jira/browse/ARROW-16199
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Alenka Frim
The supply of filters in pq.ParquetDataset and pq.read_table when using the old
API should give a better error message:
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq
data = [
list(range(5)),
list(map(str, range(5))),
]
schema = pa.schema([
('i64', pa.int64()),
('str', pa.string()),
])
batch = pa.record_batch(data, schema=schema)
table = pa.Table.from_batches([batch])
pq.write_table(table, 'example.parquet')
{code}
{code:python}
>>> pq.ParquetDataset('example.parquet', use_legacy_dataset=True,
>>> filters=[('str', '=', "1")])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line
1755, in __init__
self._filter(filters)
File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line
1933, in _filter
accepts_filter = self._partitions.filter_accepts_partition
AttributeError: 'NoneType' object has no attribute 'filter_accepts_partition'
{code}
{code:python}
>>> pq.read_table('example.parquet', use_legacy_dataset=True, filters=[('str',
>>> '=', "1")])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line
2760, in read_table
pf = ParquetDataset(
File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line
1755, in __init__
self._filter(filters)
File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line
1933, in _filter
accepts_filter = self._partitions.filter_accepts_partition
AttributeError: 'NoneType' object has no attribute 'filter_accepts_partition'
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)