Owen O'Malley created ORC-577:
---------------------------------

             Summary: Allow row-level filtering
                 Key: ORC-577
                 URL: https://issues.apache.org/jira/browse/ORC-577
             Project: ORC
          Issue Type: New Feature
            Reporter: Owen O'Malley


Currently, ORC filters at three levels:
 * File level
 * Stripe (64 to 256mb) level
 * Row group (10k row) level

The filters are specified as Sargs (Search Arguments), which have a relatively 
small vocabulary. Furthermore, they only filter sets of rows if they can 
guarantee that none of the rows can pass the filter.

There are some use cases where the user needs to read a subset of the columns 
and apply more detailed row level filters. I'd suggest that we add a new method 
in Reader.Options

{{setFilter(String columnNames, Predicate<VectorizedRowBatch> filter)}}

Where the columns named in columnNames are read expanded first, then the filter 
is run and the rest of the data is read only if the predicate returns true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to