[jira] [Created] (ARROW-16199) [Python] Filters and pq.ParquetDataset/pq.read_table with legacy API

Alenka Frim (Jira) Thu, 14 Apr 2022 07:48:04 -0700

Alenka Frim created ARROW-16199:
-----------------------------------

             Summary: [Python] Filters and pq.ParquetDataset/pq.read_table with 
legacy API
                 Key: ARROW-16199
                 URL: https://issues.apache.org/jira/browse/ARROW-16199
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Alenka Frim



The supply of filters in pq.ParquetDataset and pq.read_table when using the old 
API should give a better error message:
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq

data = [
    list(range(5)),
    list(map(str, range(5))),
]
schema = pa.schema([
    ('i64', pa.int64()),
    ('str', pa.string()),
])
batch = pa.record_batch(data, schema=schema)
table = pa.Table.from_batches([batch])

pq.write_table(table, 'example.parquet')
{code}

{code:python}
>>> pq.ParquetDataset('example.parquet', use_legacy_dataset=True, 
>>> filters=[('str', '=', "1")])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line 
1755, in __init__
    self._filter(filters)
  File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line 
1933, in _filter
    accepts_filter = self._partitions.filter_accepts_partition
AttributeError: 'NoneType' object has no attribute 'filter_accepts_partition'
{code}

{code:python}
>>> pq.read_table('example.parquet', use_legacy_dataset=True, filters=[('str', 
>>> '=', "1")])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line 
2760, in read_table
    pf = ParquetDataset(
  File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line 
1755, in __init__
    self._filter(filters)
  File "/Users/alenkafrim/repos/arrow/python/pyarrow/parquet/__init__.py", line 
1933, in _filter
    accepts_filter = self._partitions.filter_accepts_partition
AttributeError: 'NoneType' object has no attribute 'filter_accepts_partition'
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (ARROW-16199) [Python] Filters and pq.ParquetDataset/pq.read_table with legacy API

Reply via email to