[ 
https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889345#action_12889345
 ] 

John Sichi commented on HIVE-417:
---------------------------------

Here are some preliminary comments on the metastore work.  We can move on to 
the plugin design next week and start getting all of this into a doc.

* We should support a property on the index which controls the name of the 
index table, and only generate an index table name automatically in the case 
where the user doesn't supply the property.  For this, we'll need to add 
property key/values to the grammar (IDXPROPERTIES like TBLPROPERTIES and 
SERDEPROPERTIES?).

* The grammar supports control over the tableFileFormat for the index table; 
what about other attributes such as row format, location, and TBLPROPERTIES?  
Some of these may be dictated by the index implementation, but it may be useful 
to override in some cases (same as tableFileFormat).

* Is the partitioning for the index independent of the partitioning for the 
table?  Don't we need to allow control over this in the grammar?

* I think we should track the status of the index (when was the last time it 
was rebuilt, if ever) so that we know whether it is fresh with respect to the 
base table data.  How should we model this in such a way that it takes 
per-partition indexing into account?

* Some metastore followups to be logged separately:  COMMENT clause on index 
definition; DESCRIBE INDEX; SHOW INDEXES; dealing with base table columns being 
dropped/renamed out from under the index

* For generating the index table structure, we'll need to move that to plugin 
(rather than in Hive.java), since each index will need a different table 
structure (or no table structure at all).

* Test queries:  remember to add ORDER BY for determinism.  Also, I'm not sure 
whether it is safe to use /tmp in the local file system (it may not exist, e.g. 
on Windows).  I used it in hbase_bulk.m, but that uses a mini HDFS cluster (not 
the local file system).

* Dropping a table with an index on it currently gives the exception below (in 
Derby; I didn't test MySQL yet).  Same for attempting to drop an index table 
directly (instead of dropping the index).  The second case should either fail 
with a meaningful exception, or implicitly drop the index definition as a 
trigger from dropping the table.

hive> create table t1(i int);
OK
hive> create index q type compact on table t1(i);
OK
hive> drop table t1;
FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown 
flushing changes to datastore
NestedThrowables:
java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of 
foreign key constraint 'INDEXS_FK3' for key (12).  The statement has been 
rolled back.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

hive> create table t5(i int);
OK
hive> create index r type compact on table t5(i);
OK
hive> drop table default__t5_r__;
FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown 
flushing changes to datastore
NestedThrowables:
java.sql.BatchUpdateException: DELETE on table 'TBLS' caused a violation of 
foreign key constraint 'INDEXS_FK2' for key (17).  The statement has been 
rolled back.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask


> Implement Indexing in Hive
> --------------------------
>
>                 Key: HIVE-417
>                 URL: https://issues.apache.org/jira/browse/HIVE-417
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0
>            Reporter: Prasad Chakka
>            Assignee: He Yongqiang
>         Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, 
> hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, 
> hive-indexing.5.thrift.patch, idx2.png, 
> indexing_with_ql_rewrites_trunk_953221.patch
>
>
> Implement indexing on Hive so that lookup and range queries are efficient.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to