[
https://issues.apache.org/jira/browse/ARROW-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436275#comment-17436275
]
Antoine Pitrou commented on ARROW-14524:
----------------------------------------
I think I lack context on what is proposed specifically here. ReadRangeCache
and WillNeed already exist, so what more is desired and why?
> [C++] Create plugging/coalescing filesystem wrapper
> ---------------------------------------------------
>
> Key: ARROW-14524
> URL: https://issues.apache.org/jira/browse/ARROW-14524
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Weston Pace
> Assignee: Weston Pace
> Priority: Major
>
> We have I/O optimizations scattered across some of our readers. The most
> prominent example is prebuffering in the parquet reader. However, these
> techniques are rather general purpose and will apply in IPC (see ARROW-14229)
> as well as other readers (e.g. Orc, maybe even CSV)
> This filesystem wrapper will not generally be necessary for local filesystems
> as the OS' filesystem schedulers are sufficient. Most of these we can
> accomplish by simply aiming for some configurable degree of parallelism (e.g.
> if there are already X requests in progress then start batching).
> Goals:
> * Batch consecutive small requests into fewer large requests
> * We could plug (configurably) small holes in read ranges as well
> * Potentially split large requests into concurrent small requests
> * Support for the RandomAccessFile::WillNeed command by prefetching ranges
--
This message was sent by Atlassian Jira
(v8.3.4#803005)