[jira] Commented: (HIVE-1644) use filter pushdown for automatically accessing indexes

He Yongqiang (JIRA) Tue, 01 Mar 2011 16:30:30 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001208#comment-13001208
 ]


He Yongqiang commented on HIVE-1644:
------------------------------------

did a quick look at the HIVE-1644.4.patch itself. 

some comments:
1) add testcase for combinehiveinputformat
2) in the new testcase, the newly added conf "hive.optimize.autoindex" is not 
used?
3) I think there already is an api in Hive.java for getting all indexes on a 
table, No? Please double check.. If not, rename getIndexesOnTable to getIndexes
4) in GenMRTableScan1.java, it is not good to hardcode the inputformat name. 
why not just use indexClassName?
5) in ExecDriver.java, it is also not good here to hardcode the conf name 
"hive.index.compact.file", because bitmap index may want to use a different 
name. So maybe should pass these work to some index type specific class
6) in the generateIndexQuery, the temp directory is not a random, so could 
conflict with others (in the same query), and the dir path should not be 
generated there, should be generated in the optimizer which can have global 
control. And if i think "insert overwrite directory 'full_path_to_a_dir' select 
.." would fail if the full_path_to_a_dir does not exist (or its parent does not 
exist). please check here
7) In the genereateIndexQuery, what is this used for?
+    ParseContext indexQueryPctx = 
RewriteParseContextGenerator.generateOperatorTree(pctx.getConf(), qlCommand);


And today the index optimizer is before the breaking task tree. So the index 
scan task is generated before the task for original table scan. so it is very 
hard to hook them together. The only i can think is to remember the op id for 
the original table scan, and do another process to hook them together after 
breaking task tree. But i think it is too hack.

Maybe a better way to do it is in the physical optimizer. In physical 
optimizer, hive presents a task tree. and the optimizer can go through each 
task, and do the same thing (since each task has the same operator tree). And 
it will be much easier for managing task dependency. And i think most code will 
be the same. And for complex queries, this approach will be cleaner.


> use filter pushdown for automatically accessing indexes
> -------------------------------------------------------
>
>                 Key: HIVE-1644
>                 URL: https://issues.apache.org/jira/browse/HIVE-1644
>             Project: Hive
>          Issue Type: Improvement
>          Components: Indexing
>    Affects Versions: 0.7.0
>            Reporter: John Sichi
>            Assignee: Russell Melick
>         Attachments: HIVE-1644.1.patch, HIVE-1644.2.patch, HIVE-1644.3.patch, 
> HIVE-1644.4.patch
>
>
> HIVE-1226 provides utilities for analyzing filters which have been pushed 
> down to a table scan.  The next step is to use these for selecting available 
> indexes and generating access plans for those indexes.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1644) use filter pushdown for automatically accessing indexes

Reply via email to