raulcd opened a new pull request, #12797:
URL: https://github.com/apache/arrow/pull/12797

   This PR fixes a bug when a list with a single element was used on 
`ParquestDataset.read()`.
   
   The following snippet was reported to reproduce the bug:
   
   **Before solution**
   ```python
   In [1]: import pyarrow.parquet as pq
      ...: import pandas as pd
      ...: df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
      ...: df.to_parquet('/tmp/test.parquet', index=False)
      ...: pq.ParquetDataset(['/tmp/test.parquet'], 
use_legacy_dataset=False).read(use_threads=False).to_pandas()
   ---------------------------------------------------------------------------
   ValueError                                Traceback (most recent call last)
   ValueError: cannot construct a FileSource from a path without a FileSystem
   Exception ignored in: 'pyarrow._dataset._make_file_source'
   Traceback (most recent call last):
     File "/home/raulcd/open_source/arrow/python/pyarrow/parquet.py", line 
1815, in __init__
       fragment = parquet_format.make_fragment(single_file, filesystem)
   ValueError: cannot construct a FileSource from a path without a FileSystem
   ---------------------------------------------------------------------------
   ```
   
   **After solution**
   ```python
   In [1]: import pyarrow.parquet as pq
      ...: import pandas as pd
      ...: df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
      ...: df.to_parquet('/tmp/test.parquet', index=False)
      ...: pq.ParquetDataset(['/tmp/test.parquet'], 
use_legacy_dataset=False).read(use_threads=False).to_pandas()
   Out[1]: 
      A  B
   0  1  a
   1  2  b
   2  3  c
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to