[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prafulla Tekawade updated HIVE-417: ----------------------------------- Attachment: indexing_with_ql_rewrites_trunk_953221.patch Hi All, Here are first set of changes about query rewrite module in Hive and intial rewrite rule to transform groupby and distinct queries so that they make use of indexes. Note that patch contains createIndex* related changes from Yongqiang Here are the files that I have touched for rewrite related changes ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java -- hasIndex and getIndex related APIs ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/rewrite/HiveRewriteEngine.java -- Rewrite engine infrastructure ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/GbToCompactSumIdxRewrite.java -- Actual rewrite rule which does above mentioned optimization ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/HiveRwRule.java ql/src/java/org/apache/hadoop/hive/ql/rewrite/rules/HiveRwRuleContext.java ql/src/test/queries/clientpositive/ql_rewrite_gbtoidx.q -- Unit test including various queries. ql/src/test/results/clientpositive/ql_rewrite_gbtoidx.q.out I've tested some queries end-to-end with this GbToIdx rewrite rule on. It is working well. Rewrite changes can be disabled with hive.ql.rewrite boolean flag and this perticular rewrite can be disabled/enabled based on hive.ql.rewrite.gbtoidx boolean flag. Default value for hive.ql.rewrite is true and that for hive.ql.rewrite.gbtoidx is false. There are currently following limitations (which are being worked upon) 1. Rewrite engine does not invoke rewrite in recursive manner. It currently just rewrites topLevel QB. 2. GbToIdx rewrite is disabled for all queries having "where clause" , we can support certain "where clauses" in which all colrefs are index key columns. 3. We need some API to know if indexes are up-to-date or not. We need to do this rewrite only when index has up-to-date data. Let me know your review comments. I am using fork of apache-git repository for development on github. All these changes are also available on http://github.com/prafullat/hive > Implement Indexing in Hive > -------------------------- > > Key: HIVE-417 > URL: https://issues.apache.org/jira/browse/HIVE-417 > Project: Hadoop Hive > Issue Type: New Feature > Components: Metastore, Query Processor > Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 > Reporter: Prasad Chakka > Assignee: He Yongqiang > Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, > hive-indexing.3.patch, indexing_with_ql_rewrites_trunk_953221.patch > > > Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.