[jira] Commented: (HIVE-417) Implement Indexing in Hive

He Yongqiang (JIRA) Sun, 17 May 2009 07:00:10 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710198#action_12710198
 ]


He Yongqiang commented on HIVE-417:
-----------------------------------

Thanks Prasad for detailed description of the index design. 
Several questions:
(1) 
Is the index based on sort? Single-column index based on sort can be very 
useful for query involving this column, both point query and range query. But 
for sort-based multi-column index, it can not be utilized for queries not 
containing the column used as primary sort order in the index. For example, we 
create an sort based index on table1(col1,col2,col3). The index uses col1 as 
primary sort order, col2 as secondary sort order, and col3...
We can use this index to accelerate queries like:
1) select * from table1 where col1>2 and col2<34  
2) select * from table1 where col1<34 and col3 >45
3) selcet * from table1 where col1>23
but, we can not use it for queries like:
4) select * from table1 where col2>34 and col3<3
5) select * from table1 where col2 =34
6) select * from table1 where col3 <45

(2) 
Should we consider using index to accelerate query involving join several 
tables. For example, we have two tables:
user(userid,name,address, age,title,company);
click(userid,url,datetime);
And now we have a query like:
select url  from user, click where user.userid=click.userid and 
user.name="user_name" and datetime between last month;  to select the url list 
the specified user visits in last month. 
If we have an index: create index user_url on table user(name), click(datetime) 
where user.userid=click.userid, then the above query can be accelerated.

(3) 
Index can also be used in Group-by aggregation queries. Should we also consider 
them?
(4)
Another feature is to integrate Lucene index with Hive. Ashish suggested to 
integrate katta. I took a look at katta, and i think it maybe not necessary to 
include katta in. If we include it, the hive user will have to deploy katta and 
zookeeper in their cluster. I think we can integrate lucene internally without 
touch katta.


> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.2.0, 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-417) Implement Indexing in Hive

Reply via email to