[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758127#action_12758127
 ] 

Prasad Chakka commented on HIVE-417:
------------------------------------

i don't think it makes much sense unless there is some clustering or sorting 
property. if there is clustering and sorting and the selectivity of a query is 
much higher than 10% then storing this metadata along with data makes sense 
instead of a separate block. the 10% threshold may be larger for Hive but the 
point still stands. in OLAP case data is change seldom and the size of this 
kind of metadata is much smaller than the data itself so the overhead of 
storing this data is negligible.

something similar to this is done in DB2 Multi-Dimensional Clustering where 
whole blocks (disk blocks) are skipped if the key value doesn't fit the query.

> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to