David Li created ARROW-12683:
--------------------------------

             Summary: [C++] Enable fine-grained I/O (coalescing) in IPC reader
                 Key: ARROW-12683
                 URL: https://issues.apache.org/jira/browse/ARROW-12683
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: David Li


ARROW-11772 enables I/O coalescing in the IPC reader, but the reader operates 
at the granularity of an entire record batch; even if you're loading only a few 
columns, the entire record batch is read. When on a high-latency file system 
(e.g. S3), we may be able to get further performance improvement by traversing 
the schema and reading only the buffers we need to read. This can be combined 
with coalescing to reduce the number of I/O calls that need to be made.

(Maybe there's another savings here in that instead of traversing the schema 
every time to figure out the buffer layout, we can do that only once up front 
and then reuse the layout subsequently?)

While ArrayLoader already appears to perform this optimization, it's being 
handed an in-memory buffer in the first place, so no savings are accomplished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to