[ 
https://issues.apache.org/jira/browse/ARROW-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436976#comment-17436976
 ] 

Weston Pace commented on ARROW-14524:
-------------------------------------

ReadRangeCache works well in the case where you plan out all your reads ahead 
of time but this was intended for something more automatic.  As long as the 
filesystem user is making concurrent reads to the same file then the reads are 
coalesced in the same fashion as ReadRangeCache (whether the calling user 
realizes they can do this or not).

Given David's feedback above I've been thinking through what this means for IPC 
and I agree, this probably isn't the way to go.  In order to generate 
concurrent reads to the same file you're probably doing 90% of the work to use 
ReadRangeCache anyways so we can just use it directly.  I'm going to prototype 
something and then I'll close this if my prototype doesn't need it.

> [C++] Create plugging/coalescing filesystem wrapper
> ---------------------------------------------------
>
>                 Key: ARROW-14524
>                 URL: https://issues.apache.org/jira/browse/ARROW-14524
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>
> We have I/O optimizations scattered across some of our readers.  The most 
> prominent example is prebuffering in the parquet reader.  However, these 
> techniques are rather general purpose and will apply in IPC (see ARROW-14229) 
> as well as other readers (e.g. Orc, maybe even CSV)
> This filesystem wrapper will not generally be necessary for local filesystems 
> as the OS' filesystem schedulers are sufficient.  Most of these we can 
> accomplish by simply aiming for some configurable degree of parallelism (e.g. 
> if there are already X requests in progress then start batching).
> Goals:
>  * Batch consecutive small requests into fewer large requests
>    * We could plug (configurably) small holes in read ranges as well
>  * Potentially split large requests into concurrent small requests
>  * Support for the RandomAccessFile::WillNeed command by prefetching ranges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to