[jira] [Updated] (ARROW-8250) [C++] Add "random access" / slice read API to RecordBatchFileReader

Wes McKinney (Jira) Tue, 02 Jun 2020 09:29:28 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wes McKinney updated ARROW-8250:
--------------------------------
    Fix Version/s:     (was: 1.0.0)

> [C++] Add "random access" / slice read API to RecordBatchFileReader
> -------------------------------------------------------------------
>
>                 Key: ARROW-8250
>                 URL: https://issues.apache.org/jira/browse/ARROW-8250
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> If you want to read a small section of a file, it is not possible to easily 
> determine the relevant record batches that need "rehydrating".
> I would propose the following:
> * A way to cheaply read (and cache, so this doesn't have to be done multiple 
> times) all the RecordBatch metadata without deserializing the record batch 
> data structures themselves
> * Based on the metadata you can then determine the range of batches that need 
> to be rehydrated and then sliced accordingly to produce the Table of interest
> This functionality can be lifted into the Feather read APIs also



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-8250) [C++] Add "random access" / slice read API to RecordBatchFileReader

Reply via email to