[
https://issues.apache.org/jira/browse/ARROW-17898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612246#comment-17612246
]
Alenka Frim commented on ARROW-17898:
-------------------------------------
I haven't used {{ArrowFSWrapper}} or {{fsspec}} before but looking at the docs
this should work:
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq
from pyarrow import fs
local = fs.LocalFileSystem()
from fsspec.implementations.arrow import ArrowFSWrapper
local_fsspec = ArrowFSWrapper(local)
table = pa.table({'year': [2020, 2022, 2021, 2022, 2019, 2021],
'n_legs': [2, 2, 4, 4, 5, 100]})
pq.write_table(table, 'example.parquet', filesystem=local_fsspec)
pq.read_table("example.parquet", filesystem=local_fsspec)
{code}
and it is also erroring for me (pyarrow 8.0.0 and 9.0.0)
{code:java}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Users/alenkafrim/repos/pyarrow-triaging-9/lib/python3.9/site-packages/pyarrow/parquet/__init__.py",
line 2780, in read_table
dataset = _ParquetDatasetV2(
File
"/Users/alenkafrim/repos/pyarrow-triaging-9/lib/python3.9/site-packages/pyarrow/parquet/__init__.py",
line 2368, in __init__
[fragment], schema=schema or fragment.physical_schema,
File "pyarrow/_dataset.pyx", line 898, in
pyarrow._dataset.Fragment.physical_schema.__get__
File "pyarrow/error.pxi", line 144, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/io.pxi", line 265, in pyarrow.lib.NativeFile.tell
File "pyarrow/io.pxi", line 197, in
pyarrow.lib.NativeFile.get_random_access_file
File "pyarrow/io.pxi", line 226, in pyarrow.lib.NativeFile._assert_seekable
OSError: only valid on seekable files
{code}
> pyarrow.parquet.read_table fs (filesystem) argument does not work with
> fsspec.implementations.arrow.ArrowFSWrapper objects
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-17898
> URL: https://issues.apache.org/jira/browse/ARROW-17898
> Project: Apache Arrow
> Issue Type: Bug
> Components: Parquet, Python
> Affects Versions: 8.0.1
> Environment: Python 3.8.10
> Reporter: Anshul Kanakia
> Priority: Major
> Fix For: 8.0.2
>
>
> My version of PyArrow=8.0.0. When I attempt to use a PyArrow LocalFileSystem
> object wrapped with ArrowFSWrapper it results in the following error:
> import pyarrow as pa
> import pyarrow.parquet as pq
> from fsspec.implementations.arrow import ArrowFSWrapper
> {quote}lfs = pa.fs.LocalFileSystem()
> fs = ArrowFSWrapper(lfs)
> pat = pq.read_table("some/file/location.parquet", filesystem=fs)
> ---
> {{OSError Traceback (most recent call last) Cell In [12], line 1 ----> 1 pat
> = pq.read_table( *2* "some/file/location.parquet", *3* filesystem=fs) File
> /usr/local/lib/python3.8/dist-packages/pyarrow/parquet/__init__.py:2737, in
> read_table(source, columns, use_threads, metadata, schema,
> use_pandas_metadata, memory_map, read_dictionary, filesystem, filters,
> buffer_size, partitioning, use_legacy_dataset, ignore_prefixes, pre_buffer,
> coerce_int96_timestamp_unit, decryption_properties) *2730* raise ValueError(
> *2731* "The 'metadata' keyword is no longer supported with the new " *2732*
> "datasets-based implementation. Specify " *2733* "'use_legacy_dataset=True'
> to temporarily recover the old " *2734* "behaviour." *2735* ) *2736* try:
> -> 2737 dataset = _ParquetDatasetV2( *2738* source, *2739* schema=schema,
> *2740* filesystem=filesystem, *2741* partitioning=partitioning, *2742*
> memory_map=memory_map, *2743* read_dictionary=read_dictionary, *2744*
> buffer_size=buffer_size, *2745* filters=filters, *2746*
> ignore_prefixes=ignore_prefixes,}}
> ...
> File /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:193, in
> pyarrow.lib.NativeFile.get_random_access_file() File
> /usr/local/lib/python3.8/dist-packages/pyarrow/io.pxi:222, in
> pyarrow.lib.NativeFile._assert_seekable() OSError: only valid on seekable
> files{quote}
> If I instead use just the LocalFileSystem object without the ArrowFSWrapper,
> it works as expected.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)