[ 
https://issues.apache.org/jira/browse/ARROW-17898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshul Kanakia updated ARROW-17898:
-----------------------------------
    Description: 
My version of PyArrow=8.0.0. When I attempt to use a PyArrow LocalFileSystem 
object wrapped with ArrowFSWrapper it results in the following error:

{code:python}
import pyarrow as pa
import pyarrow.parquet as pq
from fsspec.implementations.arrow import ArrowFSWrapper

lfs = pa.fs.LocalFileSystem()
fs = ArrowFSWrapper(lfs)
pat = pq.read_table("some/file/location.parquet", filesystem=fs)
{code}

{code}
{{OSError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pat = 
pq.read_table(  *2* "some/file/location.parquet",  *3* filesystem=fs) File 
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet/__init__.py:2737, in 
read_table(source, columns, use_threads, metadata, schema, use_pandas_metadata, 
memory_map, read_dictionary, filesystem, filters, buffer_size, partitioning, 
use_legacy_dataset, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, 
decryption_properties)  *2730* raise ValueError(  *2731* "The 'metadata' 
keyword is no longer supported with the new "  *2732* "datasets-based 
implementation. Specify "  *2733* "'use_legacy_dataset=True' to temporarily 
recover the old "  *2734* "behaviour."  *2735* )  *2736* try: -> 2737 dataset = 
_ParquetDatasetV2(  *2738* source,  *2739* schema=schema,  *2740* 
filesystem=filesystem,  *2741* partitioning=partitioning,  *2742* 
memory_map=memory_map,  *2743* read_dictionary=read_dictionary,  *2744* 
buffer_size=buffer_size,  *2745* filters=filters,  *2746* 
ignore_prefixes=ignore_prefixes,}}
...
File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:193, in 
pyarrow.lib.NativeFile.get_random_access_file() File 
/usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:222, in 
pyarrow.lib.NativeFile._assert_seekable() OSError: only valid on seekable files
{code}

If I instead use just the LocalFileSystem object without the ArrowFSWrapper, it 
works as expected.

 


  was:
My version of PyArrow=8.0.0. When I attempt to use a PyArrow LocalFileSystem 
object wrapped with ArrowFSWrapper it results in the following error:


import pyarrow as pa
import pyarrow.parquet as pq
from fsspec.implementations.arrow import ArrowFSWrapper

{quote}lfs = pa.fs.LocalFileSystem()
fs = ArrowFSWrapper(lfs)
pat = pq.read_table("some/file/location.parquet", filesystem=fs)
---
{{OSError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pat = 
pq.read_table(  *2* "some/file/location.parquet",  *3* filesystem=fs) File 
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet/__init__.py:2737, in 
read_table(source, columns, use_threads, metadata, schema, use_pandas_metadata, 
memory_map, read_dictionary, filesystem, filters, buffer_size, partitioning, 
use_legacy_dataset, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, 
decryption_properties)  *2730* raise ValueError(  *2731* "The 'metadata' 
keyword is no longer supported with the new "  *2732* "datasets-based 
implementation. Specify "  *2733* "'use_legacy_dataset=True' to temporarily 
recover the old "  *2734* "behaviour."  *2735* )  *2736* try: -> 2737 dataset = 
_ParquetDatasetV2(  *2738* source,  *2739* schema=schema,  *2740* 
filesystem=filesystem,  *2741* partitioning=partitioning,  *2742* 
memory_map=memory_map,  *2743* read_dictionary=read_dictionary,  *2744* 
buffer_size=buffer_size,  *2745* filters=filters,  *2746* 
ignore_prefixes=ignore_prefixes,}}
...
File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:193, in 
pyarrow.lib.NativeFile.get_random_access_file() File 
/usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:222, in 
pyarrow.lib.NativeFile._assert_seekable() OSError: only valid on seekable 
files{quote}

If I instead use just the LocalFileSystem object without the ArrowFSWrapper, it 
works as expected.

 



> pyarrow.parquet.read_table fs (filesystem) argument does not work with 
> fsspec.implementations.arrow.ArrowFSWrapper objects
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17898
>                 URL: https://issues.apache.org/jira/browse/ARROW-17898
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, Python
>    Affects Versions: 8.0.1
>         Environment: Python 3.8.10
>            Reporter: Anshul Kanakia
>            Priority: Major
>
> My version of PyArrow=8.0.0. When I attempt to use a PyArrow LocalFileSystem 
> object wrapped with ArrowFSWrapper it results in the following error:
> {code:python}
> import pyarrow as pa
> import pyarrow.parquet as pq
> from fsspec.implementations.arrow import ArrowFSWrapper
> lfs = pa.fs.LocalFileSystem()
> fs = ArrowFSWrapper(lfs)
> pat = pq.read_table("some/file/location.parquet", filesystem=fs)
> {code}
> {code}
> {{OSError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pat 
> = pq.read_table(  *2* "some/file/location.parquet",  *3* filesystem=fs) File 
> /usr/local/lib/python3.8/dist-packages/pyarrow/parquet/__init__.py:2737, in 
> read_table(source, columns, use_threads, metadata, schema, 
> use_pandas_metadata, memory_map, read_dictionary, filesystem, filters, 
> buffer_size, partitioning, use_legacy_dataset, ignore_prefixes, pre_buffer, 
> coerce_int96_timestamp_unit, decryption_properties)  *2730* raise ValueError( 
>  *2731* "The 'metadata' keyword is no longer supported with the new "  *2732* 
> "datasets-based implementation. Specify "  *2733* "'use_legacy_dataset=True' 
> to temporarily recover the old "  *2734* "behaviour."  *2735* )  *2736* try: 
> -> 2737 dataset = _ParquetDatasetV2(  *2738* source,  *2739* schema=schema,  
> *2740* filesystem=filesystem,  *2741* partitioning=partitioning,  *2742* 
> memory_map=memory_map,  *2743* read_dictionary=read_dictionary,  *2744* 
> buffer_size=buffer_size,  *2745* filters=filters,  *2746* 
> ignore_prefixes=ignore_prefixes,}}
> ...
> File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:193, in 
> pyarrow.lib.NativeFile.get_random_access_file() File 
> /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:222, in 
> pyarrow.lib.NativeFile._assert_seekable() OSError: only valid on seekable 
> files
> {code}
> If I instead use just the LocalFileSystem object without the ArrowFSWrapper, 
> it works as expected.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to