[
https://issues.apache.org/jira/browse/HIVE-29322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047601#comment-18047601
]
Indhumathi Muthumurugesh edited comment on HIVE-29322 at 12/26/25 10:11 AM:
----------------------------------------------------------------------------
[~zabetak] [~okumin] sorry for the late response. I’m just back from the winter
break and catching up on this.
Please find the answers in the below attached document.
[TopNkey_Analysis|https://docs.google.com/document/d/1dh55JpguHwUm2jk3yIAvoIA1xt9pLhwPIpY1ZtE9ygw/edit?usp=sharing]
was (Author: indhumathi27):
[~zabetak] [~okumin] sorry for the late response. I’m just back from the winter
break and catching up on this.
Please find the answers in the below attached document.
[TopNkey_Analysis_doc|https://docs.google.com/document/d/1BvFN79zcz-Z2UJXFnWgdhe6ux4gFen-8WR6hYuYi4WA/edit?usp=sharing]
> Performance Degraded with Top N Key Pushdown for Order by Limit Queries
> -----------------------------------------------------------------------
>
> Key: HIVE-29322
> URL: https://issues.apache.org/jira/browse/HIVE-29322
> Project: Hive
> Issue Type: Bug
> Reporter: Indhumathi Muthumurugesh
> Assignee: Indhumathi Muthumurugesh
> Priority: Major
> Labels: pull-request-available
>
> Env: Used latest docker image: docker pull apache/hive:4.2.0
> Table Schema Used:
> create table seg1 (a string,b string,c string,d string,e string,f string,g
> string,h int,i string,j string,k string,l string,m int) ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',' ;
> create table seg1_orc (a string,b string,c string,d string,e string,f
> string,g string,h int,i string,j string,k string,l string,m int) stored as
> orc;
>
> -- Generated and Inserted 16,777,216 records
>
> -- Query
> select * from seg1_orc order by h limit 100;
>
> Plan:
> {code:java}
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:100 |
> | Stage-1 |
> | Reducer 2 vectorized |
> | File Output Operator [FS_12] |
> | Limit [LIM_11] (rows=100 width=1101) |
> | Number of rows:100 |
> | Select Operator [SEL_10] (rows=16777216 width=1101) |
> |
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12"]
> |
> | <-Map 1 [SIMPLE_EDGE] vectorized |
> | SHUFFLE [RS_9] |
> | Select Operator [SEL_8] (rows=16777216 width=1101) |
> |
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12"]
> |
> | Top N Key Operator [TNK_7] (rows=16777216 width=1101) |
> | keys:h,top n:100 |
> | TableScan [TS_0] (rows=16777216 width=1101) |
> |
> default@seg1_orc,seg1_orc,Tbl:COMPLETE,Col:COMPLETE,Output:["a","b","c","d","e","f","g","h","i","j","k","l","m"]
> |
> | |
> +----------------------------------------------------+ {code}
> Time Taken: *100 rows selected (40.806 seconds)*
> Set hive.optimize.topnkey=false;
> Time Taken:
> *100 rows selected (7.802 seconds)*
>
> *Issue:*
> It is seen that Top n key is processing all rows in the Map phase.
> Set hive.optimize.topnkey=false;
> {code:java}
> INFO :
> ----------------------------------------------------------------------------------------------
> INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms)
> INPUT_RECORDS OUTPUT_RECORDS
> INFO :
> ----------------------------------------------------------------------------------------------
> INFO : Map 1 7061.00 0 0
> 16,777,216 100
> INFO : Reducer 2 0.00 0 0
> 100 0
> INFO :
> ----------------------------------------------------------------------------------------------
> {code}
> Set hive.optimize.topnkey=true;
> {code:java}
> INFO :
> ----------------------------------------------------------------------------------------------
> INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms)
> INPUT_RECORDS OUTPUT_RECORDS
> INFO :
> ----------------------------------------------------------------------------------------------
> INFO : Map 1 31805.00 0 0
> 16,777,216 16,777,216
> INFO : Reducer 2 8156.00 0 0
> 16,777,216 0 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)