[jira] [Commented] (ARROW-17672) [C++] Unrecognized filesystem type in URI: abfss://

Joris Van den Bossche (Jira) Tue, 13 Sep 2022 04:04:32 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-17672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603514#comment-17603514
 ]


Joris Van den Bossche commented on ARROW-17672:
-----------------------------------------------

You are using {{adlfs}}, which is an fsspec-compatible filesystem, and so 
normally I expect that the pandas {{read_parquet}} call converts the 
"abfss://data.parquet" URI to an fsspec filesystem, passing that to the 
underlying pyarrow function, and we do have support for fsspec filesystems (and 
in that way we can support filesystems that don't have native support inside 
Arrow C++, such as Azure at the moment). 

So something is going wrong here. As a starter, can you indicate which versions 
you are using for pyarrow, pandas, fsspec and adlfs? (eg a {{pip list}} or 
{{conda list}})

> [C++] Unrecognized filesystem type in URI: abfss://
> ---------------------------------------------------
>
>                 Key: ARROW-17672
>                 URL: https://issues.apache.org/jira/browse/ARROW-17672
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Parquet, Python
>            Reporter: Prakhar Sandhu
>            Priority: Major
>
> I am running the below commands in databricks.
> When I am trying to read a file which is stored in adls using pandas:
> {code:java}
> pip install adlfs 
> import pandas as pd
> data = pd.read_parquet("abfss://data.parquet", storage_options= {}){code}
> Then I got the below error: 
> {code:java}
> File "/databricks/python/lib/python3.7/site-packages/pandas/io/parquet.py", 
> line 310, in read_parquet
> return impl.read(path, columns=columns, **kwargs)
> File "/databricks/python/lib/python3.7/site-packages/pandas/io/parquet.py", 
> line 125, in read
> path, columns=columns, **kwargs
> File "/databricks/python/lib/python3.7/site-packages/pyarrow/parquet.py", 
> line 1573, in read_table
> ignore_prefixes=ignore_prefixes,
> File "/databricks/python/lib/python3.7/site-packages/pyarrow/parquet.py", 
> line 1434, in __init__
> ignore_prefixes=ignore_prefixes)
> File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", 
> line 667, in dataset
> return _filesystem_dataset(source, **kwargs)
> File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", 
> line 424, in _filesystem_dataset
> fs, paths_or_selector = _ensure_single_source(source, filesystem)
> File "/databricks/python/lib/python3.7/site-packages/pyarrow/dataset.py", 
> line 371, in _ensure_single_source
> filesystem, path = FileSystem.from_uri(path)
> File "pyarrow/_fs.pyx", line 347, in pyarrow._fs.FileSystem.from_uri
> File "pyarrow/error.pxi", line 122, in 
> pyarrow.lib.pyarrow_internal_check_status
> File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Unrecognized filesystem type in URI: 
> abfss:///data.parquet {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17672) [C++] Unrecognized filesystem type in URI: abfss://

Reply via email to