[ 
https://issues.apache.org/jira/browse/HIVE-22735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22735:
----------------------------------
    Attachment: HIVE-22735.2.patch

> TopNKey operator deduplication
> ------------------------------
>
>                 Key: HIVE-22735
>                 URL: https://issues.apache.org/jira/browse/HIVE-22735
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>             Fix For: 4.0.0
>
>         Attachments: HIVE-22735.1.patch, HIVE-22735.2.patch
>
>
> In some cases more than one TNK operator has the same expressions in the same 
> operator tree or the difference is only a constant column. Most of this cases 
> only one TNK op. should remain.
> {code}
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> | Plan not optimized by CBO.                         |
> |                                                    |
> | Vertex dependency in root stage                    |
> | Map 1 <- Reducer 8 (BROADCAST_EDGE)                |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Map 6 
> (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) |
> | Reducer 3 <- Reducer 2 (SIMPLE_EDGE)               |
> | Reducer 4 <- Reducer 3 (SIMPLE_EDGE)               |
> | Reducer 8 <- Map 7 (CUSTOM_SIMPLE_EDGE)            |
> |                                                    |
> | Stage-0                                            |
> |   Fetch Operator                                   |
> |     limit:50                                       |
> |     Stage-1                                        |
> |       Reducer 4 vectorized                         |
> |       File Output Operator [FS_127]                |
> |         Limit [LIM_126] (rows=50 width=538)        |
> |           Number of rows:50                        |
> |           Select Operator [SEL_125] (rows=190 width=538) |
> |             
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"] |
> |           <-Reducer 3 [SIMPLE_EDGE]                |
> |             SHUFFLE [RS_30]                        |
> |               Select Operator [SEL_29] (rows=190 width=538) |
> |                 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"] |
> |                 Group By Operator [GBY_28] (rows=190 width=538) |
> |                   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"],aggregations:["avg(VALUE._col0)","avg(VALUE._col1)","avg(VALUE._col2)","avg(VALUE._col3)"],keys:KEY._col0,
>  KEY._col1, KEY._col2 |
> |                 <-Reducer 2 [SIMPLE_EDGE]          |
> |                   SHUFFLE [RS_27]                  |
> |                     PartitionCols:_col0, _col1, _col2 |
> |                     Group By Operator [GBY_26] (rows=190 width=1134) |
> |                       
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6"],aggregations:["avg(_col9)","avg(_col11)","avg(_col18)","avg(_col12)"],keys:_col102,
>  _col93, 0L |
> |                       Top N Key Operator [TNK_60] (rows=127 width=234) |
> |                         keys:_col102, _col93, 0L,top n:50 |
> |                         Select Operator [SEL_25] (rows=127 width=234) |
> |                           
> Output:["_col9","_col11","_col12","_col18","_col93","_col102"] |
> |                           Top N Key Operator [TNK_58] (rows=127 width=234) |
> |                             keys:_col102, _col93,top n:50 |
> |                             Filter Operator [FIL_49] (rows=127 width=234) |
> |                               predicate:((_col22 = _col38) and (_col1 = 
> _col101) and (_col6 = _col69) and (_col3 = _col26)) |
> |                               Map Join Operator [MAPJOIN_102] (rows=2044 
> width=232) |
> |                                 
> Conds:MAPJOIN_101._col1=RS_123.i_item_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26","_col38","_col69","_col93","_col101","_col102"]
>  |
> |                               <-Map 9 [BROADCAST_EDGE] vectorized |
> |                                 BROADCAST [RS_123] |
> |                                   PartitionCols:i_item_sk |
> |                                   Filter Operator [FIL_122] (rows=204000 
> width=108) |
> |                                     predicate:i_item_sk is not null |
> |                                     TableScan [TS_4] (rows=204000 
> width=108) |
> |                                       
> tpcds_bin_partitioned_orc_100@item,item, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["i_item_sk","i_item_id"] |
> |                               <-Map Join Operator [MAPJOIN_101] (rows=2010 
> width=118) |
> |                                   
> Conds:MAPJOIN_100._col6=RS_107.s_store_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26","_col38","_col69","_col93"]
>  |
> |                                 <-Map 7 [BROADCAST_EDGE] vectorized |
> |                                   PARTITION_ONLY_SHUFFLE [RS_107] |
> |                                     PartitionCols:s_store_sk |
> |                                     Filter Operator [FIL_106] (rows=402 
> width=94) |
> |                                       predicate:s_store_sk is not null |
> |                                       TableScan [TS_3] (rows=402 width=94) |
> |                                         
> tpcds_bin_partitioned_orc_100@store,store, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["s_store_sk","s_state"] |
> |                                 <-Map Join Operator [MAPJOIN_100] 
> (rows=9604000 width=24) |
> |                                     
> Conds:MERGEJOIN_99._col22=RS_118.d_date_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26","_col38"]
>  |
> |                                   <-Map 6 [BROADCAST_EDGE] vectorized |
> |                                     BROADCAST [RS_118] |
> |                                       PartitionCols:d_date_sk |
> |                                       Filter Operator [FIL_117] (rows=73049 
> width=8) |
> |                                         predicate:d_date_sk is not null |
> |                                         TableScan [TS_2] (rows=73049 
> width=8) |
> |                                           
> tpcds_bin_partitioned_orc_100@date_dim,date_dim, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk"] |
> |                                     Dynamic Partitioning Event Operator 
> [EVENT_121] (rows=1 width=8) |
> |                                       Group By Operator [GBY_120] (rows=1 
> width=8) |
> |                                         Output:["_col0"],keys:_col0 |
> |                                         Select Operator [SEL_119] 
> (rows=73049 width=8) |
> |                                           Output:["_col0"] |
> |                                            Please refer to the previous 
> Filter Operator [FIL_117] |
> |                                   <-Merge Join Operator [MERGEJOIN_99] 
> (rows=9604000 width=16) |
> |                                       
> Conds:RS_114.ss_cdemo_sk=RS_116.cd_demo_sk(Inner),Output:["_col1","_col3","_col6","_col9","_col11","_col12","_col18","_col22","_col26"]
>  |
> |                                     <-Map 1 [SIMPLE_EDGE] vectorized |
> |                                       SHUFFLE [RS_114] |
> |                                         PartitionCols:ss_cdemo_sk |
> |                                         Filter Operator [FIL_113] 
> (rows=235814137 width=353) |
> |                                           predicate:(ss_cdemo_sk is not 
> null and ss_store_sk is not null and ss_item_sk is not null and ss_store_sk 
> BETWEEN DynamicValue(RS_17_store_s_store_sk_min) AND 
> DynamicValue(RS_17_store_s_store_sk_max) and in_bloom_filter(ss_store_sk, 
> DynamicValue(RS_17_store_s_store_sk_bloom_filter))) |
> |                                           TableScan [TS_0] (rows=275041999 
> width=723) |
> |                                             
> tpcds_bin_partitioned_orc_100@store_sales,store_sales, ACID 
> table,Tbl:COMPLETE,Col:PARTIAL,Output:["ss_item_sk","ss_cdemo_sk","ss_store_sk","ss_quantity","ss_list_price","ss_sales_price","ss_coupon_amt"]
>  |
> |                                           <-Reducer 8 [BROADCAST_EDGE] 
> vectorized |
> |                                             BROADCAST [RS_112] |
> |                                               Group By Operator [GBY_111] 
> (rows=1 width=24) |
> |                                                 
> Output:["_col0","_col1","_col2"],aggregations:["min(VALUE._col0)","max(VALUE._col1)","bloom_filter(VALUE._col2,
>  expectedEntries=1000000)"] |
> |                                     <-Map 5 [SIMPLE_EDGE] vectorized |
> |                                       SHUFFLE [RS_116] |
> |                                         PartitionCols:cd_demo_sk |
> |                                         Filter Operator [FIL_115] 
> (rows=1920800 width=8) |
> |                                           predicate:cd_demo_sk is not null |
> |                                           TableScan [TS_1] (rows=1920800 
> width=8) |
> |                                             
> tpcds_bin_partitioned_orc_100@customer_demographics,customer_demographics, 
> ACID table,Tbl:COMPLETE,Col:COMPLETE,Output:["cd_demo_sk"] |
> |                                                    |
> +----------------------------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to