raulcd opened a new pull request, #12797:
URL: https://github.com/apache/arrow/pull/12797
This PR fixes a bug when a list with a single element was used on
`ParquestDataset.read()`.
The following snippet was reported to reproduce the bug:
**Before solution**
```python
In [1]: import pyarrow.parquet as pq
...: import pandas as pd
...: df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
...: df.to_parquet('/tmp/test.parquet', index=False)
...: pq.ParquetDataset(['/tmp/test.parquet'],
use_legacy_dataset=False).read(use_threads=False).to_pandas()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
ValueError: cannot construct a FileSource from a path without a FileSystem
Exception ignored in: 'pyarrow._dataset._make_file_source'
Traceback (most recent call last):
File "/home/raulcd/open_source/arrow/python/pyarrow/parquet.py", line
1815, in __init__
fragment = parquet_format.make_fragment(single_file, filesystem)
ValueError: cannot construct a FileSource from a path without a FileSystem
---------------------------------------------------------------------------
```
**After solution**
```python
In [1]: import pyarrow.parquet as pq
...: import pandas as pd
...: df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
...: df.to_parquet('/tmp/test.parquet', index=False)
...: pq.ParquetDataset(['/tmp/test.parquet'],
use_legacy_dataset=False).read(use_threads=False).to_pandas()
Out[1]:
A B
0 1 a
1 2 b
2 3 c
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]