[jira] [Resolved] (HIVE-20473) Optimization for materialized views

Jesus Camacho Rodriguez (Jira) Wed, 23 Jun 2021 12:34:07 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-20473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jesus Camacho Rodriguez resolved HIVE-20473.
--------------------------------------------
    Fix Version/s: 4.0.0
       Resolution: Duplicate

> Optimization for materialized views
> -----------------------------------
>
>                 Key: HIVE-20473
>                 URL: https://issues.apache.org/jira/browse/HIVE-20473
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: 3.0.0
>         Environment: Can be reproduced on a Single node pseudo cluster. 
>            Reporter: Shyam Rai
>            Priority: Critical
>              Labels: materializedviews
>             Fix For: 4.0.0
>
>
> Optimizer is taking advantage of materialized view only when the query syntax 
> matches the way view was created. Here is an example.
> *Source table on which materialized views are created*
> {code}
> +----------------------------------------------------+
> |                   createtab_stmt                   |
> +----------------------------------------------------+
> | CREATE TABLE `mysource`(                           |
> |   `id` int,                                        |
> |   `name` string,                                   |
> |   `start_date` date)                               |
> | ROW FORMAT SERDE                                   |
> |   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
> | WITH SERDEPROPERTIES (                             |
> |   'field.delim'=',',                               |
> |   'serialization.format'=',')                      |
> | STORED AS INPUTFORMAT                              |
> |   'org.apache.hadoop.mapred.TextInputFormat'       |
> | OUTPUTFORMAT                                       |
> |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
> | LOCATION                                           |
> |   
> 'hdfs://xlhive3.openstacklocal:8020/warehouse/tablespace/managed/hive/mysource'
>  |
> | TBLPROPERTIES (                                    |
> |   'bucketing_version'='2',                         |
> |   'transactional'='true',                          |
> |   'transactional_properties'='insert_only',        |
> |   'transient_lastDdlTime'='1535392655')            |
> +----------------------------------------------------+
> {code}
> One of the materialized views "view_1" is created to fetch the data between 
> IDs 1 and 2 using this statement
> {code}
> select `mysource`.`id`, `mysource`.`name`, `mysource`.`start_date` from 
> `default`.`mysource` where `mysource`.`id` between 1 and 2
> {code}
> *When a SELECT is executed against the source table using the following 
> SELECT statement, this works fine and can be validated with the explain plan.
> *
> {code}
> 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where 
> id between 1 and 2;
> INFO  : Compiling 
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): 
> explain select * from mysource where id between 1 and 2
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
> type:string, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); 
> Time taken: 0.224 seconds
> INFO  : Executing 
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c): 
> explain select * from mysource where id between 1 and 2
> INFO  : Starting task [Stage-1:EXPLAIN] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20180828062847_b313e0aa-686c-42f5-94e2-252dd836501c); 
> Time taken: 0.006 seconds
> INFO  : OK
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> | STAGE DEPENDENCIES:                                |
> |   Stage-0 is a root stage                          |
> |                                                    |
> | STAGE PLANS:                                       |
> |   Stage: Stage-0                                   |
> |     Fetch Operator                                 |
> |       limit: -1                                    |
> |       Processor Tree:                              |
> |         TableScan                                  |
> |           alias: default.view_1                    |
> |           Select Operator                          |
> |             expressions: id (type: int), name (type: string), start_date 
> (type: date) |
> |             outputColumnNames: _col0, _col1, _col2 |
> |             ListSink                               |
> |                                                    |
> +----------------------------------------------------+
> {code}
> If the rewrite of the same SELECT is written using >= and <=, which should 
> yield the same result, the optimizer does not take advantage of the 
> materialized view, unless of course we create another view with this >= and 
> <= syntax. 
> {code}
> 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where 
> id >= 1 and <=2;
> Error: Error while compiling statement: FAILED: ParseException line 1:49 
> cannot recognize input near '<=' '2' '<EOF>' in expression specification 
> (state=42000,code=40000)
> 0: jdbc:hive2://localhost:10000/default> explain select * from mysource where 
> id >= 1 and id <=2;
> INFO  : Compiling 
> command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca): 
> explain select * from mysource where id >= 1 and id <=2
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
> type:string, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca); 
> Time taken: 0.226 seconds
> INFO  : Executing 
> command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca): 
> explain select * from mysource where id >= 1 and id <=2
> INFO  : Starting task [Stage-1:EXPLAIN] in serial mode
> INFO  : Completed executing 
> command(queryId=hive_20180828063123_7a5bf6dd-5999-44d7-a2cf-3b2e634371ca); 
> Time taken: 0.005 seconds
> INFO  : OK
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> | STAGE DEPENDENCIES:                                |
> |   Stage-0 is a root stage                          |
> |                                                    |
> | STAGE PLANS:                                       |
> |   Stage: Stage-0                                   |
> |     Fetch Operator                                 |
> |       limit: -1                                    |
> |       Processor Tree:                              |
> |         TableScan                                  |
> |           alias: mysource                          |
> |           filterExpr: ((id >= 1) and (id <= 2)) (type: boolean) |
> |           Filter Operator                          |
> |             predicate: ((id >= 1) and (id <= 2)) (type: boolean) |
> |             Select Operator                        |
> |               expressions: id (type: int), name (type: string), start_date 
> (type: date) |
> |               outputColumnNames: _col0, _col1, _col2 |
> |               ListSink                             |
> |                                                    |
> +----------------------------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-20473) Optimization for materialized views

Reply via email to