[ 
https://issues.apache.org/jira/browse/ORC-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Zhang reassigned ORC-577:
---------------------------------

    Assignee:     (was: Richard Zhang)

> Allow row-level filtering
> -------------------------
>
>                 Key: ORC-577
>                 URL: https://issues.apache.org/jira/browse/ORC-577
>             Project: ORC
>          Issue Type: New Feature
>            Reporter: Owen O'Malley
>            Priority: Major
>
> Currently, ORC filters at three levels:
>  * File level
>  * Stripe (64 to 256mb) level
>  * Row group (10k row) level
> The filters are specified as Sargs (Search Arguments), which have a 
> relatively small vocabulary. Furthermore, they only filter sets of rows if 
> they can guarantee that none of the rows can pass the filter.
> There are some use cases where the user needs to read a subset of the columns 
> and apply more detailed row level filters. I'd suggest that we add a new 
> method in Reader.Options
> {{setFilter(String columnNames, Predicate<VectorizedRowBatch> filter)}}
> Where the columns named in columnNames are read expanded first, then the 
> filter is run and the rest of the data is read only if the predicate returns 
> true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to