[ https://issues.apache.org/jira/browse/ARROW-17898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anshul Kanakia updated ARROW-17898: ----------------------------------- Description: My version of PyArrow=8.0.0. When I attempt to use a PyArrow LocalFileSystem object wrapped with ArrowFSWrapper it results in the following error: {code:python} import pyarrow as pa import pyarrow.parquet as pq from fsspec.implementations.arrow import ArrowFSWrapper lfs = pa.fs.LocalFileSystem() fs = ArrowFSWrapper(lfs) pat = pq.read_table("some/file/location.parquet", filesystem=fs) {code} {code} {{OSError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pat = pq.read_table( *2* "some/file/location.parquet", *3* filesystem=fs) File /usr/local/lib/python3.8/dist-packages/pyarrow/parquet/__init__.py:2737, in read_table(source, columns, use_threads, metadata, schema, use_pandas_metadata, memory_map, read_dictionary, filesystem, filters, buffer_size, partitioning, use_legacy_dataset, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, decryption_properties) *2730* raise ValueError( *2731* "The 'metadata' keyword is no longer supported with the new " *2732* "datasets-based implementation. Specify " *2733* "'use_legacy_dataset=True' to temporarily recover the old " *2734* "behaviour." *2735* ) *2736* try: -> 2737 dataset = _ParquetDatasetV2( *2738* source, *2739* schema=schema, *2740* filesystem=filesystem, *2741* partitioning=partitioning, *2742* memory_map=memory_map, *2743* read_dictionary=read_dictionary, *2744* buffer_size=buffer_size, *2745* filters=filters, *2746* ignore_prefixes=ignore_prefixes,}} ... File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:193, in pyarrow.lib.NativeFile.get_random_access_file() File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:222, in pyarrow.lib.NativeFile._assert_seekable() OSError: only valid on seekable files {code} If I instead use just the LocalFileSystem object without the ArrowFSWrapper, it works as expected. was: My version of PyArrow=8.0.0. When I attempt to use a PyArrow LocalFileSystem object wrapped with ArrowFSWrapper it results in the following error: import pyarrow as pa import pyarrow.parquet as pq from fsspec.implementations.arrow import ArrowFSWrapper {quote}lfs = pa.fs.LocalFileSystem() fs = ArrowFSWrapper(lfs) pat = pq.read_table("some/file/location.parquet", filesystem=fs) --- {{OSError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pat = pq.read_table( *2* "some/file/location.parquet", *3* filesystem=fs) File /usr/local/lib/python3.8/dist-packages/pyarrow/parquet/__init__.py:2737, in read_table(source, columns, use_threads, metadata, schema, use_pandas_metadata, memory_map, read_dictionary, filesystem, filters, buffer_size, partitioning, use_legacy_dataset, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, decryption_properties) *2730* raise ValueError( *2731* "The 'metadata' keyword is no longer supported with the new " *2732* "datasets-based implementation. Specify " *2733* "'use_legacy_dataset=True' to temporarily recover the old " *2734* "behaviour." *2735* ) *2736* try: -> 2737 dataset = _ParquetDatasetV2( *2738* source, *2739* schema=schema, *2740* filesystem=filesystem, *2741* partitioning=partitioning, *2742* memory_map=memory_map, *2743* read_dictionary=read_dictionary, *2744* buffer_size=buffer_size, *2745* filters=filters, *2746* ignore_prefixes=ignore_prefixes,}} ... File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:193, in pyarrow.lib.NativeFile.get_random_access_file() File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:222, in pyarrow.lib.NativeFile._assert_seekable() OSError: only valid on seekable files{quote} If I instead use just the LocalFileSystem object without the ArrowFSWrapper, it works as expected. > pyarrow.parquet.read_table fs (filesystem) argument does not work with > fsspec.implementations.arrow.ArrowFSWrapper objects > -------------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-17898 > URL: https://issues.apache.org/jira/browse/ARROW-17898 > Project: Apache Arrow > Issue Type: Bug > Components: Parquet, Python > Affects Versions: 8.0.1 > Environment: Python 3.8.10 > Reporter: Anshul Kanakia > Priority: Major > > My version of PyArrow=8.0.0. When I attempt to use a PyArrow LocalFileSystem > object wrapped with ArrowFSWrapper it results in the following error: > {code:python} > import pyarrow as pa > import pyarrow.parquet as pq > from fsspec.implementations.arrow import ArrowFSWrapper > lfs = pa.fs.LocalFileSystem() > fs = ArrowFSWrapper(lfs) > pat = pq.read_table("some/file/location.parquet", filesystem=fs) > {code} > {code} > {{OSError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pat > = pq.read_table( *2* "some/file/location.parquet", *3* filesystem=fs) File > /usr/local/lib/python3.8/dist-packages/pyarrow/parquet/__init__.py:2737, in > read_table(source, columns, use_threads, metadata, schema, > use_pandas_metadata, memory_map, read_dictionary, filesystem, filters, > buffer_size, partitioning, use_legacy_dataset, ignore_prefixes, pre_buffer, > coerce_int96_timestamp_unit, decryption_properties) *2730* raise ValueError( > *2731* "The 'metadata' keyword is no longer supported with the new " *2732* > "datasets-based implementation. Specify " *2733* "'use_legacy_dataset=True' > to temporarily recover the old " *2734* "behaviour." *2735* ) *2736* try: > -> 2737 dataset = _ParquetDatasetV2( *2738* source, *2739* schema=schema, > *2740* filesystem=filesystem, *2741* partitioning=partitioning, *2742* > memory_map=memory_map, *2743* read_dictionary=read_dictionary, *2744* > buffer_size=buffer_size, *2745* filters=filters, *2746* > ignore_prefixes=ignore_prefixes,}} > ... > File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:193, in > pyarrow.lib.NativeFile.get_random_access_file() File > /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:222, in > pyarrow.lib.NativeFile._assert_seekable() OSError: only valid on seekable > files > {code} > If I instead use just the LocalFileSystem object without the ArrowFSWrapper, > it works as expected. > -- This message was sent by Atlassian Jira (v8.20.10#820010)