[
https://issues.apache.org/jira/browse/ARROW-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435365#comment-17435365
]
Weston Pace commented on ARROW-14503:
-------------------------------------
This may be a duplicate of ARROW-12683 . [~lidavidm] thoughts?
> [C++][Dataset] Projection pushdown in IPC (feather) format
> ----------------------------------------------------------
>
> Key: ARROW-14503
> URL: https://issues.apache.org/jira/browse/ARROW-14503
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Weston Pace
> Priority: Major
>
> The datasets API uses the RecordBatchFileReader to read feather files. This
> reader will always "read" the entire file. If the file is memory mapped this
> might not be a true read. However, the datasets API never uses memory mapped
> files.
> This large read from RAM (or worse, disk) becomes a bottleneck for simple
> queries that load only a few columns from the dataset.
> The fix may be to modify the reader to seek out and pluck only the needed
> data. Or the fix may be to modify the datasets API to use memory mapped
> files when possible (although the former approach seems more generally
> applicable).
> This is related to ARROW-8250 but that issue seems more focused on row
> filtering while this issue is for column filtering.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)