[ https://issues.apache.org/jira/browse/HIVE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731681#comment-13731681 ]
Gopal V commented on HIVE-4246: ------------------------------- The IN() implementation does a linear search on the predicate leaves right now. Since we are only checking range & not actual membership, it would be better to store it as a sorted list and perform a bin search. In most cases this will enable a fast path for the list's min/max. But in the corner case we'll get a case where the bin search inserts min & max at the same location & matches no element, then we can skip the block. > Implement predicate pushdown for ORC > ------------------------------------ > > Key: HIVE-4246 > URL: https://issues.apache.org/jira/browse/HIVE-4246 > Project: Hive > Issue Type: New Feature > Components: File Formats > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: HIVE-4246.D11415.1.patch > > > By using the push down predicates from the table scan operator, ORC can skip > over 10,000 rows at a time that won't satisfy the predicate. This will help a > lot, especially if the file is sorted by the column that is used in the > predicate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira