[ 
https://issues.apache.org/jira/browse/HIVE-29322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047601#comment-18047601
 ] 

Indhumathi Muthumurugesh edited comment on HIVE-29322 at 12/26/25 10:11 AM:
----------------------------------------------------------------------------

[~zabetak] [~okumin] sorry for the late response. I’m just back from the winter 
break and catching up on this.

Please find the answers in the below attached document. 

[TopNkey_Analysis|https://docs.google.com/document/d/1dh55JpguHwUm2jk3yIAvoIA1xt9pLhwPIpY1ZtE9ygw/edit?usp=sharing]


was (Author: indhumathi27):
[~zabetak] [~okumin] sorry for the late response. I’m just back from the winter 
break and catching up on this.

Please find the answers in the below attached document. 

[TopNkey_Analysis_doc|https://docs.google.com/document/d/1BvFN79zcz-Z2UJXFnWgdhe6ux4gFen-8WR6hYuYi4WA/edit?usp=sharing]

> Performance Degraded with Top N Key Pushdown for Order by Limit Queries
> -----------------------------------------------------------------------
>
>                 Key: HIVE-29322
>                 URL: https://issues.apache.org/jira/browse/HIVE-29322
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Indhumathi Muthumurugesh
>            Assignee: Indhumathi Muthumurugesh
>            Priority: Major
>              Labels: pull-request-available
>
> Env: Used latest docker image: docker pull apache/hive:4.2.0
> Table Schema Used:
> create table seg1 (a string,b string,c string,d string,e string,f string,g 
> string,h int,i string,j string,k string,l string,m int) ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY ',' ;
> create table seg1_orc (a string,b string,c string,d string,e string,f 
> string,g string,h int,i string,j string,k string,l string,m int) stored as 
> orc;
>  
> -- Generated and Inserted  16,777,216 records
>  
> -- Query
> select * from seg1_orc order by h limit 100;
>  
> Plan:
> {code:java}
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> | Plan optimized by CBO.                             |
> |                                                    |
> | Vertex dependency in root stage                    |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE)                   |
> |                                                    |
> | Stage-0                                            |
> |   Fetch Operator                                   |
> |     limit:100                                      |
> |     Stage-1                                        |
> |       Reducer 2 vectorized                         |
> |       File Output Operator [FS_12]                 |
> |         Limit [LIM_11] (rows=100 width=1101)       |
> |           Number of rows:100                       |
> |           Select Operator [SEL_10] (rows=16777216 width=1101) |
> |             
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12"]
>  |
> |           <-Map 1 [SIMPLE_EDGE] vectorized         |
> |             SHUFFLE [RS_9]                         |
> |               Select Operator [SEL_8] (rows=16777216 width=1101) |
> |                 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12"]
>  |
> |                 Top N Key Operator [TNK_7] (rows=16777216 width=1101) |
> |                   keys:h,top n:100                 |
> |                   TableScan [TS_0] (rows=16777216 width=1101) |
> |                     
> default@seg1_orc,seg1_orc,Tbl:COMPLETE,Col:COMPLETE,Output:["a","b","c","d","e","f","g","h","i","j","k","l","m"]
>  |
> |                                                    |
> +----------------------------------------------------+ {code}
> Time Taken:  *100 rows selected (40.806 seconds)*
> Set hive.optimize.topnkey=false;
> Time Taken:  
> *100 rows selected (7.802 seconds)*
>  
> *Issue:*
> It is seen that Top n key is processing all rows in the Map phase.
> Set hive.optimize.topnkey=false;
> {code:java}
> INFO  : 
> ----------------------------------------------------------------------------------------------
> INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
> INPUT_RECORDS   OUTPUT_RECORDS
> INFO  : 
> ----------------------------------------------------------------------------------------------
> INFO  :      Map 1           7061.00              0              0      
> 16,777,216              100
> INFO  :  Reducer 2              0.00              0              0            
>  100                0
> INFO  : 
> ----------------------------------------------------------------------------------------------
>  {code}
> Set hive.optimize.topnkey=true;
> {code:java}
> INFO  : 
> ----------------------------------------------------------------------------------------------
> INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
> INPUT_RECORDS   OUTPUT_RECORDS
> INFO  : 
> ----------------------------------------------------------------------------------------------
> INFO  :      Map 1          31805.00              0              0      
> 16,777,216       16,777,216
> INFO  :  Reducer 2           8156.00              0              0      
> 16,777,216                0 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to