[ 
https://issues.apache.org/jira/browse/ARROW-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536739#comment-17536739
 ] 

Joris Van den Bossche edited comment on ARROW-16545 at 5/13/22 3:53 PM:
------------------------------------------------------------------------

> fsspec's HadoopFilesystem inherits ArrowFSWrapper - so if I wrap it - it's 
> double wrapped

Indeed, the ArrowFSWrapper in fsspec is only meant as a base class to implement 
_fsspec-compatible_ filesystems that are wrapping a pyarrow filesystem. So you 
don't need to wrap an fsspec HadoopFileSystem again in ArrowFSWrapper. 

The pyarrow methods like {{pq.read_table}} do accept fsspec-compatible 
filesystems (under the hood we will wrap those in a 
{{pyarrow.fs.FSSpecHandler}}). So in your test script, you can pass the 
{{hdfs}} fsspec filesystem object directly to {{pq.read_table}} (as you noticed 
that this works).

Now, in practice, if you don't need other fsspec functionality, and since the 
fsspec HadoopFileSystem is wrapping a pyarrow HadoopFileSystem, you could also 
directly use the pyarrow filesystem instead, in this case.


was (Author: jorisvandenbossche):
> fsspec's HadoopFilesystem inherits ArrowFSWrapper - so if I wrap it - it's 
> double wrapped

Indeed, the ArrowFSWrapper in fsspec is only meant as a base class to provide 
_fsspec-compatible_ filesystems that are wrapping a pyarrow filesystem. So you 
don't need to wrap an fsspec HadoopFileSystem again in ArrowFSWrapper. 

The pyarrow methods like {{pq.read_table}} do accept fsspec-compatible 
filesystems (under the hood we will wrap those in a 
{{pyarrow.fs.FSSpecHandler}}). So in your test script, you can pass the 
{{hdfs}} fsspec filesystem object directly to {{pq.read_table}} (as you noticed 
that this works).

Now, in practice, if you don't need other fsspec functionality, and since the 
fsspec HadoopFileSystem is wrapping a pyarrow HadoopFileSystem, you could also 
directly use the pyarrow filesystem instead, in this case.

> [Python] pyarrow.parquet.read_table fails with ArrowFSWrapper -> OSError
> ------------------------------------------------------------------------
>
>                 Key: ARROW-16545
>                 URL: https://issues.apache.org/jira/browse/ARROW-16545
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 6.0.1, 8.0.0
>            Reporter: Björn Boschman
>            Priority: Major
>         Attachments: test.py
>
>
> My understanding would be that filesystem=ArrowFSWrapper should work?
> Haven't tried any other Wrapped Filesystems yet
> See attached sample code
> [^test.py]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to