[ 
https://issues.apache.org/jira/browse/ARROW-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kirby updated ARROW-17174:
-------------------------------
    Summary: [C++] FileSystemDataset FilenamePartitioning error - fsspec 
filesystem  (was: FileSystemDataset FilenamePartitioning error - fsspec 
filesystem)

> [C++] FileSystemDataset FilenamePartitioning error - fsspec filesystem
> ----------------------------------------------------------------------
>
>                 Key: ARROW-17174
>                 URL: https://issues.apache.org/jira/browse/ARROW-17174
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 8.0.0
>            Reporter: Adam Kirby
>            Priority: Major
>         Attachments: zip_of_csvs_test.py
>
>
> Unless this is user error (which it may well be!), it seems that Dataset 
> FilenamePartitioning on read doesn't seem to work with an fsspec filesystem. 
> From what I can glean, the filenames can be parsed successfully when passed 
> to the parse() method, but do not seem to be being extracted as fields from 
> the filenames passed to dataset() – instead, they appear as nulls. When 
> trying to use the partitioning discover() method (assuming this is a 
> reasonable thing to try), I get the below traceback. (Repro python script 
> attached).
> Traceback (most recent call last):
>   File "/zip_of_csvs_test.py", line 82, in <module>
>     ds_partitioned = pds.dataset(
>   File 
> "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 
> 697, in dataset
>     return _filesystem_dataset(source, **kwargs)
>   File 
> "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 
> 449, in _filesystem_dataset
>     return factory.finish(schema)
>   File "pyarrow/_dataset.pyx", line 1857, in 
> pyarrow._dataset.DatasetFactory.finish
>   File "pyarrow/error.pxi", line 144, in 
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: No non-null segments were available for field 
> 'frequency'; couldn't infer type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to