[jira] [Commented] (ARROW-1012) [C++] Create a configurable implementation of RecordBatchReader that reads from Apache Parquet files

Hatem Helal (JIRA) Fri, 21 Jun 2019 06:04:20 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869452#comment-16869452
 ]


Hatem Helal commented on ARROW-1012:
------------------------------------

I added my use cases to the PR here:

[https://github.com/apache/arrow/pull/4304#issuecomment-504417220]

 

The primary use-cases for this API that I have come across are:
 # Iterative reading {{N}} rows a a time from a Parquet file. This might be 
necessary if the deserialized row group would be too large to fit in memory
 # Cheap-preview: with this change we can efficiently read the first {{N}} rows 
of a Parquet file.

> [C++] Create a configurable implementation of RecordBatchReader that reads 
> from Apache Parquet files
> ----------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-1012
>                 URL: https://issues.apache.org/jira/browse/ARROW-1012
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Hatem Helal
>            Priority: Major
>              Labels: parquet, pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This will be enabled by -ARROW-1008.-
> A preliminary implementation of an {{arrow::RecordBatchReader}} was added in 
> PARQUET-1166 but does not support configuring the batch size.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1012) [C++] Create a configurable implementation of RecordBatchReader that reads from Apache Parquet files

Reply via email to