[
https://issues.apache.org/jira/browse/ARROW-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434325#comment-17434325
]
Joris Van den Bossche commented on ARROW-12681:
-----------------------------------------------
In the context of ARROW-14470 for the Feather reader, we have been looking a
bit into the IpcReadOptions.
Some observations / questions:
- For writing, we already expose the IpcWriteOptions in Python (so also
exposing IpcReadOptions would be consistent with that), although I agree adding
a {{columns}} keyword would be more user friendly.
- Typically (for other readers we have), such a {{columns}} keyword for only
reading a subset is exposed in the "read" function. But for
RecordBatchFileReader, the options are passed when opening the reader. So in
the Python API it would rather be {{RecordBatchFileReader(source,
columns=...).read_all()}} instead of
{{RecordBatchFileReader(source).read_all(columns=...)}}. Are we OK with that
discrepancy on the Python side?
> [Python] Expose IpcReadOptions to ipc facility
> ----------------------------------------------
>
> Key: ARROW-12681
> URL: https://issues.apache.org/jira/browse/ARROW-12681
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Francois Saint-Jacques
> Priority: Minor
>
> I would like to be able to read only a subset of columns from a given IPC
> file. To do this, we need to expose the EXPERIMENTAL (is it still?)
> IpcReaderOptions.include_fields option. The reason is that the file is on a
> remote storage and can't mmap thus I want to minimize network transfer.
> I do not know the best way to "pythonize" IpcReaderOptions and would need
> help on this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)