[
https://issues.apache.org/jira/browse/ORC-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Panagiotis Garefalakis reassigned ORC-577:
------------------------------------------
Assignee: Panagiotis Garefalakis
> Allow row-level filtering
> -------------------------
>
> Key: ORC-577
> URL: https://issues.apache.org/jira/browse/ORC-577
> Project: ORC
> Issue Type: New Feature
> Reporter: Owen O'Malley
> Assignee: Panagiotis Garefalakis
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently, ORC filters at three levels:
> * File level
> * Stripe (64 to 256mb) level
> * Row group (10k row) level
> The filters are specified as Sargs (Search Arguments), which have a
> relatively small vocabulary. Furthermore, they only filter sets of rows if
> they can guarantee that none of the rows can pass the filter.
> There are some use cases where the user needs to read a subset of the columns
> and apply more detailed row level filters. I'd suggest that we add a new
> method in Reader.Options
> {{setFilter(String columnNames, Predicate<VectorizedRowBatch> filter)}}
> Where the columns named in columnNames are read expanded first, then the
> filter is run and the rest of the data is read only if the predicate returns
> true.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)