Shyam Rai created HIVE-20473: -------------------------------- Summary: Optimization for materialized views Key: HIVE-20473 URL: https://issues.apache.org/jira/browse/HIVE-20473 Project: Hive Issue Type: Improvement Components: Hive Affects Versions: 3.0.0 Environment: Can be reproduced on a Single node pseudo cluster. Reporter: Shyam Rai
Optimizer is taking advantage of materialized view only when the query syntax matches the way view was created. Here is an example. *Source table on which materialized views are created* {code} +----------------------------------------------------+ | createtab_stmt | +----------------------------------------------------+ | CREATE TABLE `mysource`( | | `id` int, | | `name` string, | | `start_date` date) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'field.delim'=',', | | 'serialization.format'=',') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://xlhive3.openstacklocal:8020/warehouse/tablespace/managed/hive/mysource' | | TBLPROPERTIES ( | | 'bucketing_version'='2', | | 'transactional'='true', | | 'transactional_properties'='insert_only', | | 'transient_lastDdlTime'='1535392655') | +----------------------------------------------------+ {code} One of the materialized views "view_1" is created to fetch the data between IDs 1 and 2 using this statement {code} select `mysource`.`id`, `mysource`.`name`, `mysource`.`start_date` from `default`.`mysource` where `mysource`.`id` between 1 and 2 {code} *When a SELECT is executed against the source table using the following SELECT statement, this works fine and can be validated with the explain plan. * {code} 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where id between 1 and 2; INFO : Compiling command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): explain select * from mysource where id between 1 and 2 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); Time taken: 0.224 seconds INFO : Executing command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): explain select * from mysource where id between 1 and 2 INFO : Starting task [Stage-1:EXPLAIN] in serial mode INFO : Completed executing command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); Time taken: 0.006 seconds INFO : OK +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: default.view_1 | | Select Operator | | expressions: id (type: int), name (type: string), start_date (type: date) | | outputColumnNames: _col0, _col1, _col2 | | ListSink | | | +----------------------------------------------------+ {code} If the rewrite of the same SELECT is written using >= and <=, which should yield the same result, the optimizer does not take advantage of the materialized view, unless of course we create another view with this >= and <= syntax. {code} 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where id >= 1 and <=2; Error: Error while compiling statement: FAILED: ParseException line 1:49 cannot recognize input near '<=' '2' '<EOF>' in expression specification (state=42000,code=40000) 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where id >= 1 and id <=2; INFO : Compiling command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca): explain select * from mysource where id >= 1 and id <=2 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca); Time taken: 0.226 seconds INFO : Executing command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca): explain select * from mysource where id >= 1 and id <=2 INFO : Starting task [Stage-1:EXPLAIN] in serial mode INFO : Completed executing command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca); Time taken: 0.005 seconds INFO : OK +----------------------------------------------------+ | Explain | +----------------------------------------------------+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: mysource | | filterExpr: ((id >= 1) and (id <= 2)) (type: boolean) | | Filter Operator | | predicate: ((id >= 1) and (id <= 2)) (type: boolean) | | Select Operator | | expressions: id (type: int), name (type: string), start_date (type: date) | | outputColumnNames: _col0, _col1, _col2 | | ListSink | | | +----------------------------------------------------+ {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)