[
https://issues.apache.org/jira/browse/HIVE-20473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jesus Camacho Rodriguez resolved HIVE-20473.
--------------------------------------------
Fix Version/s: 4.0.0
Resolution: Duplicate
> Optimization for materialized views
> -----------------------------------
>
> Key: HIVE-20473
> URL: https://issues.apache.org/jira/browse/HIVE-20473
> Project: Hive
> Issue Type: Improvement
> Components: Hive
> Affects Versions: 3.0.0
> Environment: Can be reproduced on a Single node pseudo cluster.
> Reporter: Shyam Rai
> Priority: Critical
> Labels: materializedviews
> Fix For: 4.0.0
>
>
> Optimizer is taking advantage of materialized view only when the query syntax
> matches the way view was created. Here is an example.
> *Source table on which materialized views are created*
> {code}
> +----------------------------------------------------+
> | createtab_stmt |
> +----------------------------------------------------+
> | CREATE TABLE `mysource`( |
> | `id` int, |
> | `name` string, |
> | `start_date` date) |
> | ROW FORMAT SERDE |
> | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
> | WITH SERDEPROPERTIES ( |
> | 'field.delim'=',', |
> | 'serialization.format'=',') |
> | STORED AS INPUTFORMAT |
> | 'org.apache.hadoop.mapred.TextInputFormat' |
> | OUTPUTFORMAT |
> | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
> | LOCATION |
> |
> 'hdfs://xlhive3.openstacklocal:8020/warehouse/tablespace/managed/hive/mysource'
> |
> | TBLPROPERTIES ( |
> | 'bucketing_version'='2', |
> | 'transactional'='true', |
> | 'transactional_properties'='insert_only', |
> | 'transient_lastDdlTime'='1535392655') |
> +----------------------------------------------------+
> {code}
> One of the materialized views "view_1" is created to fetch the data between
> IDs 1 and 2 using this statement
> {code}
> select `mysource`.`id`, `mysource`.`name`, `mysource`.`start_date` from
> `default`.`mysource` where `mysource`.`id` between 1 and 2
> {code}
> *When a SELECT is executed against the source table using the following
> SELECT statement, this works fine and can be validated with the explain plan.
> *
> {code}
> 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where
> id between 1 and 2;
> INFO : Compiling
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c):
> explain select * from mysource where id between 1 and 2
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain,
> type:string, comment:null)], properties:null)
> INFO : Completed compiling
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c);
> Time taken: 0.224 seconds
> INFO : Executing
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c):
> explain select * from mysource where id between 1 and 2
> INFO : Starting task [Stage-1:EXPLAIN] in serial mode
> INFO : Completed executing
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c);
> Time taken: 0.006 seconds
> INFO : OK
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | STAGE DEPENDENCIES: |
> | Stage-0 is a root stage |
> | |
> | STAGE PLANS: |
> | Stage: Stage-0 |
> | Fetch Operator |
> | limit: -1 |
> | Processor Tree: |
> | TableScan |
> | alias: default.view_1 |
> | Select Operator |
> | expressions: id (type: int), name (type: string), start_date
> (type: date) |
> | outputColumnNames: _col0, _col1, _col2 |
> | ListSink |
> | |
> +----------------------------------------------------+
> {code}
> If the rewrite of the same SELECT is written using >= and <=, which should
> yield the same result, the optimizer does not take advantage of the
> materialized view, unless of course we create another view with this >= and
> <= syntax.
> {code}
> 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where
> id >= 1 and <=2;
> Error: Error while compiling statement: FAILED: ParseException line 1:49
> cannot recognize input near '<=' '2' '<EOF>' in expression specification
> (state=42000,code=40000)
> 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where
> id >= 1 and id <=2;
> INFO : Compiling
> command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca):
> explain select * from mysource where id >= 1 and id <=2
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain,
> type:string, comment:null)], properties:null)
> INFO : Completed compiling
> command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca);
> Time taken: 0.226 seconds
> INFO : Executing
> command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca):
> explain select * from mysource where id >= 1 and id <=2
> INFO : Starting task [Stage-1:EXPLAIN] in serial mode
> INFO : Completed executing
> command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca);
> Time taken: 0.005 seconds
> INFO : OK
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | STAGE DEPENDENCIES: |
> | Stage-0 is a root stage |
> | |
> | STAGE PLANS: |
> | Stage: Stage-0 |
> | Fetch Operator |
> | limit: -1 |
> | Processor Tree: |
> | TableScan |
> | alias: mysource |
> | filterExpr: ((id >= 1) and (id <= 2)) (type: boolean) |
> | Filter Operator |
> | predicate: ((id >= 1) and (id <= 2)) (type: boolean) |
> | Select Operator |
> | expressions: id (type: int), name (type: string), start_date
> (type: date) |
> | outputColumnNames: _col0, _col1, _col2 |
> | ListSink |
> | |
> +----------------------------------------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)