Anshul Kanakia created ARROW-17898:
--------------------------------------
Summary: pyarrow.parquet.read_table fs (filesystem) argument does
not work with fsspec.implementations.arrow.ArrowFSWrapper objects
Key: ARROW-17898
URL: https://issues.apache.org/jira/browse/ARROW-17898
Project: Apache Arrow
Issue Type: Bug
Components: Parquet, Python
Affects Versions: 8.0.1
Environment: Python 3.8.10
Reporter: Anshul Kanakia
Fix For: 8.0.2
My version of PyArrow=8.0.0. When I attempt to use a PyArrow LocalFileSystem
object wrapped with ArrowFSWrapper it results in the following error:
import pyarrow as pa
import pyarrow.parquet as pq
from fsspec.implementations.arrow import ArrowFSWrapper
{quote}lfs = pa.fs.LocalFileSystem()
fs = ArrowFSWrapper(lfs)
pat = pq.read_table("some/file/location.parquet", filesystem=fs)
---
{{OSError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pat =
pq.read_table( *2* "some/file/location.parquet", *3* filesystem=fs) File
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet/__init__.py:2737, in
read_table(source, columns, use_threads, metadata, schema, use_pandas_metadata,
memory_map, read_dictionary, filesystem, filters, buffer_size, partitioning,
use_legacy_dataset, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit,
decryption_properties) *2730* raise ValueError( *2731* "The 'metadata'
keyword is no longer supported with the new " *2732* "datasets-based
implementation. Specify " *2733* "'use_legacy_dataset=True' to temporarily
recover the old " *2734* "behaviour." *2735* ) *2736* try: -> 2737 dataset =
_ParquetDatasetV2( *2738* source, *2739* schema=schema, *2740*
filesystem=filesystem, *2741* partitioning=partitioning, *2742*
memory_map=memory_map, *2743* read_dictionary=read_dictionary, *2744*
buffer_size=buffer_size, *2745* filters=filters, *2746*
ignore_prefixes=ignore_prefixes,}}
...
File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:193, in
pyarrow.lib.NativeFile.get_random_access_file() File
/usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:222, in
pyarrow.lib.NativeFile._assert_seekable() OSError: only valid on seekable
files{quote}
If I instead use just the LocalFileSystem object without the ArrowFSWrapper, it
works as expected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)