David Li created ARROW-12522:
--------------------------------

             Summary: [C++] Implement asynchronous/"lazy" variants of 
ReadRangeCache
                 Key: ARROW-12522
                 URL: https://issues.apache.org/jira/browse/ARROW-12522
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: David Li
            Assignee: David Li
             Fix For: 5.0.0


Currently ReadRangeCache performs both readahead and coalescing. Also, it 
exposes primarily a blocking API. Two improvements would be useful for 
implementing async-generator versions of file readers:
 * A method to get a Future<> for a set of read ranges, so that you can 
asynchronously wait for ranges to be read instead of attempting to read and 
getting blocked
 * A way to make the cache not perform readahead, so that data is fetched only 
when requested. (Then, consumers could handle readahead by making multiple 
requests to the cache.)

The cache would still act as an actual cache and would still coalesce. (A 
further improvement might be to allow discarding cache entries. For the purpose 
of getting AsyncGenerator<RecordBatch>, we don't need a range more than once, 
so the cache is just wasting memory.)

This makes it straightforward to adapt synchronous readers into asynchronous 
ones so long as you know the read ranges up front; you can then cache all the 
ranges, call WaitFor<>, then hand the buffer to the existing synchronous reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to