[
https://issues.apache.org/jira/browse/ARROW-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536739#comment-17536739
]
Joris Van den Bossche edited comment on ARROW-16545 at 5/13/22 3:53 PM:
------------------------------------------------------------------------
> fsspec's HadoopFilesystem inherits ArrowFSWrapper - so if I wrap it - it's
> double wrapped
Indeed, the ArrowFSWrapper in fsspec is only meant as a base class to implement
_fsspec-compatible_ filesystems that are wrapping a pyarrow filesystem. So you
don't need to wrap an fsspec HadoopFileSystem again in ArrowFSWrapper.
The pyarrow methods like {{pq.read_table}} do accept fsspec-compatible
filesystems (under the hood we will wrap those in a
{{pyarrow.fs.FSSpecHandler}}). So in your test script, you can pass the
{{hdfs}} fsspec filesystem object directly to {{pq.read_table}} (as you noticed
that this works).
Now, in practice, if you don't need other fsspec functionality, and since the
fsspec HadoopFileSystem is wrapping a pyarrow HadoopFileSystem, you could also
directly use the pyarrow filesystem instead, in this case.
was (Author: jorisvandenbossche):
> fsspec's HadoopFilesystem inherits ArrowFSWrapper - so if I wrap it - it's
> double wrapped
Indeed, the ArrowFSWrapper in fsspec is only meant as a base class to provide
_fsspec-compatible_ filesystems that are wrapping a pyarrow filesystem. So you
don't need to wrap an fsspec HadoopFileSystem again in ArrowFSWrapper.
The pyarrow methods like {{pq.read_table}} do accept fsspec-compatible
filesystems (under the hood we will wrap those in a
{{pyarrow.fs.FSSpecHandler}}). So in your test script, you can pass the
{{hdfs}} fsspec filesystem object directly to {{pq.read_table}} (as you noticed
that this works).
Now, in practice, if you don't need other fsspec functionality, and since the
fsspec HadoopFileSystem is wrapping a pyarrow HadoopFileSystem, you could also
directly use the pyarrow filesystem instead, in this case.
> [Python] pyarrow.parquet.read_table fails with ArrowFSWrapper -> OSError
> ------------------------------------------------------------------------
>
> Key: ARROW-16545
> URL: https://issues.apache.org/jira/browse/ARROW-16545
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 6.0.1, 8.0.0
> Reporter: Björn Boschman
> Priority: Major
> Attachments: test.py
>
>
> My understanding would be that filesystem=ArrowFSWrapper should work?
> Haven't tried any other Wrapped Filesystems yet
> See attached sample code
> [^test.py]
--
This message was sent by Atlassian Jira
(v8.20.7#820007)