[jira] [Commented] (ARROW-14503) [C++][Dataset] Projection pushdown in IPC (feather) format

Weston Pace (Jira) Thu, 28 Oct 2021 05:17:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435365#comment-17435365
 ]


Weston Pace commented on ARROW-14503:
-------------------------------------

This may be a duplicate of ARROW-12683 .  [~lidavidm] thoughts?

> [C++][Dataset] Projection pushdown in IPC (feather) format
> ----------------------------------------------------------
>
>                 Key: ARROW-14503
>                 URL: https://issues.apache.org/jira/browse/ARROW-14503
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> The datasets API uses the RecordBatchFileReader to read feather files.  This 
> reader will always "read" the entire file.  If the file is memory mapped this 
> might not be a true read.  However, the datasets API never uses memory mapped 
> files.
> This large read from RAM (or worse, disk) becomes a bottleneck for simple 
> queries that load only a few columns from the dataset.
> The fix may be to modify the reader to seek out and pluck only the needed 
> data.  Or the fix may be to modify the datasets API to use memory mapped 
> files when possible (although the former approach seems more generally 
> applicable).
> This is related to ARROW-8250 but that issue seems more focused on row 
> filtering while this issue is for column filtering.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-14503) [C++][Dataset] Projection pushdown in IPC (feather) format

Reply via email to