[GitHub] [arrow] raulcd opened a new pull request, #12797: [ARROW-13922] Fix ParquetDataset throw error when len(path_or_paths) == 1

GitBox Tue, 05 Apr 2022 03:38:01 -0700


raulcd opened a new pull request, #12797:
URL: https://github.com/apache/arrow/pull/12797


   This PR fixes a bug when a list with a single element was used on 
`ParquestDataset.read()`.
   
   The following snippet was reported to reproduce the bug:
   
   **Before solution**
   ```python
   In [1]: import pyarrow.parquet as pq
      ...: import pandas as pd
      ...: df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
      ...: df.to_parquet('/tmp/test.parquet', index=False)
      ...: pq.ParquetDataset(['/tmp/test.parquet'], 
use_legacy_dataset=False).read(use_threads=False).to_pandas()
   ---------------------------------------------------------------------------
   ValueError                                Traceback (most recent call last)
   ValueError: cannot construct a FileSource from a path without a FileSystem
   Exception ignored in: 'pyarrow._dataset._make_file_source'
   Traceback (most recent call last):
     File "/home/raulcd/open_source/arrow/python/pyarrow/parquet.py", line 
1815, in __init__
       fragment = parquet_format.make_fragment(single_file, filesystem)
   ValueError: cannot construct a FileSource from a path without a FileSystem
   ---------------------------------------------------------------------------
   ```
   
   **After solution**
   ```python
   In [1]: import pyarrow.parquet as pq
      ...: import pandas as pd
      ...: df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
      ...: df.to_parquet('/tmp/test.parquet', index=False)
      ...: pq.ParquetDataset(['/tmp/test.parquet'], 
use_legacy_dataset=False).read(use_threads=False).to_pandas()
   Out[1]: 
      A  B
   0  1  a
   1  2  b
   2  3  c
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] raulcd opened a new pull request, #12797: [ARROW-13922] Fix ParquetDataset throw error when len(path_or_paths) == 1

Reply via email to