[ 
https://issues.apache.org/jira/browse/HIVE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731681#comment-13731681
 ] 

Gopal V commented on HIVE-4246:
-------------------------------

The IN() implementation does a linear search on the predicate leaves right now.

Since we are only checking range & not actual membership, it would be better to 
store it as a sorted list and perform a bin search.

In most cases this will enable a fast path for the list's min/max. 

But in the corner case we'll get a case where the bin search inserts min & max 
at the same location & matches no element, then we can skip the block.
                
> Implement predicate pushdown for ORC
> ------------------------------------
>
>                 Key: HIVE-4246
>                 URL: https://issues.apache.org/jira/browse/HIVE-4246
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: HIVE-4246.D11415.1.patch
>
>
> By using the push down predicates from the table scan operator, ORC can skip 
> over 10,000 rows at a time that won't satisfy the predicate. This will help a 
> lot, especially if the file is sorted by the column that is used in the 
> predicate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to