[ 
https://issues.apache.org/jira/browse/PARQUET-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293962#comment-17293962
 ] 

Weston Pace commented on PARQUET-1993:
--------------------------------------

> Is it to have full async Parquet reading?

Yes.  Streaming & async Parquet reading with readahead.

You could have...
{code:java}
Future<RecordBatch> ReadNext()
{code}
...but with pre-fetching that makes it difficult to figure out readahead.  
Consider what happens if I decide to add 4 calls worth of readahead and the 
reader decides that the underlying table is many small row groups and so it 
will prefetch reads of 20 record batches at once.  Then I end up leaving the 
I/O idle.

Another approach could be to push the readahead into the parquet reader.  I'm 
not sure what would be easier.

> [C++] Expose when prefetching completes
> ---------------------------------------
>
>                 Key: PARQUET-1993
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1993
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: David Li
>            Assignee: David Li
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As a follow up to PARQUET-1820, we should let an application be notified when 
> pre-buffering has completed (e.g. PreBuffer() should return Future<void>). 
> This would let an application pre-buffer some amount of data (across multiple 
> files and/or row groups) and then decode data as it becomes available instead 
> of blocking.
> A more ergonomic API would be to expose Future<RecordBatchReader> or 
> something along those lines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to