[jira] [Commented] (ORC-577) Allow row-level filtering

Panagiotis Garefalakis (Jira) Tue, 26 May 2020 08:27:01 -0700


    [ 
https://issues.apache.org/jira/browse/ORC-577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116824#comment-17116824
 ]


Panagiotis Garefalakis commented on ORC-577:
--------------------------------------------

Thanks for moving this forward [~jcamachorodriguez] [~ashutoshc] -- also 
[~omalley] for the feedback!

Btw, [~omalley] would it make sense to push this (along with 622) to 1.5 branch 
? I know its a maintenance branch but it seems to be the most straight-forward 
way (for HIVE) to take advantage of this feature -- bumping to 1.6 or 1.7 could 
introduce new incompatibilities.

> Allow row-level filtering
> -------------------------
>
>                 Key: ORC-577
>                 URL: https://issues.apache.org/jira/browse/ORC-577
>             Project: ORC
>          Issue Type: New Feature
>            Reporter: Owen O'Malley
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>             Fix For: 1.7.0
>
>         Attachments: RowFilterBenchBoolean.out, RowFilterBenchDecimal.out, 
> RowFilterBenchDouble.out, RowFilterBenchString.out, 
> RowFilterBenchTimestamp.out
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, ORC filters at three levels:
>  * File level
>  * Stripe (64 to 256mb) level
>  * Row group (10k row) level
> The filters are specified as Sargs (Search Arguments), which have a 
> relatively small vocabulary. Furthermore, they only filter sets of rows if 
> they can guarantee that none of the rows can pass the filter.
> There are some use cases where the user needs to read a subset of the columns 
> and apply more detailed row level filters. I'd suggest that we add a new 
> method in Reader.Options
> {{setFilter(String columnNames, Predicate<VectorizedRowBatch> filter)}}
> Where the columns named in columnNames are read expanded first, then the 
> filter is run and the rest of the data is read only if the predicate returns 
> true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ORC-577) Allow row-level filtering

Reply via email to