[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prafulla Tekawade updated HIVE-417:
-----------------------------------

    Attachment: indexing_with_ql_rewrites_trunk_953221.patch

Hi All,
Here are first set of changes about query rewrite module in Hive and
intial rewrite rule to transform groupby and distinct queries so that
they make use of indexes.
Note that patch contains createIndex* related changes from Yongqiang
Here are the files that I have touched for rewrite related changes
ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java
-- hasIndex and getIndex related APIs
ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java
ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
ql/src/java/org/apache/hadoop/hive/ql/rewrite/HiveRewriteEngine.java
-- Rewrite engine infrastructure
ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/GbToCompactSumIdxRewrite.java
-- Actual rewrite rule which does above mentioned optimization
ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/HiveRwRule.java
ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/HiveRwRuleContext.java
ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q
-- Unit test including various queries.
ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out

I've tested some queries end-to-end with this GbToIdx rewrite rule on.
It is working well.

Rewrite changes can be disabled with hive.ql.rewrite boolean flag 
and this perticular rewrite can be disabled/enabled based on 
hive.ql.rewrite.gbtoidx boolean flag.
Default value for hive.ql.rewrite is true
and that for hive.ql.rewrite.gbtoidx is false.

There are currently following limitations (which are being worked upon)
1. Rewrite engine does not invoke rewrite in recursive manner.
   It currently just rewrites topLevel QB.
2. GbToIdx rewrite is disabled for all queries having "where clause" , we
   can support certain "where clauses" in which all colrefs are index key 
   columns.
3. We need some API to know if indexes are up-to-date or not. We need to
   do this rewrite only when index has up-to-date data.

Let me know your review comments.
I am using fork of apache-git repository for development on github.
All these changes are also available on http://github.com/prafullat/hive   

> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, 
> hive-indexing.3.patch, indexing_with_ql_rewrites_trunk_953221.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to