[
https://issues.apache.org/jira/browse/ARROW-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney updated ARROW-8250:
--------------------------------
Fix Version/s: (was: 1.0.0)
> [C++] Add "random access" / slice read API to RecordBatchFileReader
> -------------------------------------------------------------------
>
> Key: ARROW-8250
> URL: https://issues.apache.org/jira/browse/ARROW-8250
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
>
> If you want to read a small section of a file, it is not possible to easily
> determine the relevant record batches that need "rehydrating".
> I would propose the following:
> * A way to cheaply read (and cache, so this doesn't have to be done multiple
> times) all the RecordBatch metadata without deserializing the record batch
> data structures themselves
> * Based on the metadata you can then determine the range of batches that need
> to be rehydrated and then sliced accordingly to produce the Table of interest
> This functionality can be lifted into the Feather read APIs also
--
This message was sent by Atlassian Jira
(v8.3.4#803005)