[
https://issues.apache.org/jira/browse/ARROW-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weston Pace updated ARROW-9938:
-------------------------------
Labels: filesystem good-first-issue (was: filesystem)
> [Python] Add filesystem capabilities to other IO formats (feather, csv, json,
> ..)
> ---------------------------------------------------------------------------------
>
> Key: ARROW-9938
> URL: https://issues.apache.org/jira/browse/ARROW-9938
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
> Labels: filesystem, good-first-issue
>
> In the parquet IO functions, we support reading/writing files from non-local
> filesystems directly (in addition to passing a buffer) by:
> - passing a URI (eg {{pq.read_parquet("s3://bucket/data.parquet")}})
> - specifying the filesystem keyword (eg
> {{pq.read_parquet("bucket/data.parquet", filesystem=S3FileSystem(...))}})
> On the other hand, for other file formats such as feather, we only support
> local files or buffers. So for those, you need to do the more manual (I
> _suppose_ this works?):
> {code:python}
> from pyarrow import fs, feather
> s3 = fs.S3FileSystem()
> with s3.open_input_file("bucket/data.arrow") as file:
> table = feather.read_table(file)
> {code}
> So I think the question comes up: do we want to extend this filesystem
> support to other file formats (feather, csv, json) and make this more uniform
> across pyarrow, or do we prefer to keep the plain readers more low-level (and
> people can use the datasets API for more convenience)?
> cc [~apitrou] [~kszucs]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)