saihemanth-cloudera commented on code in PR #4569:
URL: https://github.com/apache/hive/pull/4569#discussion_r1364577385


##########
standalone-metastore/metastore-server/src/main/sql/mssql/hive-schema-4.0.0-beta-2.mssql.sql:
##########
@@ -661,6 +661,8 @@ ALTER TABLE NOTIFICATION_LOG ADD CONSTRAINT 
NOTIFICATION_LOG_PK PRIMARY KEY (NL_
 
 CREATE UNIQUE INDEX NOTIFICATION_LOG_EVENT_ID ON NOTIFICATION_LOG (EVENT_ID);
 
+CREATE UNIQUE INDEX NOTIFICATION_LOG_UNIQUE_DB ON NOTIFICATION_LOG (DB_NAME, 
EVENT_ID);

Review Comment:
   @VenuReddy2103 - Good finding. The index I introduced is only used if there 
is event_id value is absolute
   `mysql> explain analyze select * from NOTIFICATION_LOG where db_name='mydb';
   
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
   | EXPLAIN                                                                    
                                                                         |
   
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
   | -> Index lookup on NOTIFICATION_LOG using NOTIFICATION_LOG_UNIQUE_DB 
(DB_NAME='mydb')  (cost=0.9 rows=4) (actual time=0.276..0.738 rows=4 loops=1)
    |
   
+-----------------------------------------------------------------------------------------------------------------------------------------------------+`
   But the real use case query would definitely have a range of conditions on 
the event_id (eg: event_id>100), but apparently the first index is being used 
'NOTIFICATION_LOG_EVENT_ID' instead of 'NOTIFICATION_LOG_UNIQUE_DB'. This is 
something not desired.
   `mysql> explain analyze select * from NOTIFICATION_LOG where db_name='mydb' 
and (tbl_name='tb1' or tbl_name='tb2') and event_id>1;
   
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | EXPLAIN                                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                  |
   
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | -> Filter: ((notification_log.DB_NAME = 'mydb') and 
((notification_log.TBL_NAME = 'tb1') or (notification_log.TBL_NAME = 'tb2')))  
(cost=1.61 rows=1.31) (actual time=0.0784..0.0969 rows=2 loops=1)
       -> Index range scan on NOTIFICATION_LOG using NOTIFICATION_LOG_EVENT_ID 
over (1 < EVENT_ID), with index condition: (notification_log.EVENT_ID > 1)  
(cost=1.61 rows=3) (actual time=0.0731..0.0902 rows=3 loops=1)
    |
   
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+`
   So in summary the second index I intend to introduce has no effect. There is 
no point in forcing the DBMS to use the index because it is inefficient in 
reading the results and has an overhead to maintain the second index. I would 
rather remove the newly introduced index. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to