VenuReddy2103 commented on code in PR #4569:
URL: https://github.com/apache/hive/pull/4569#discussion_r1353584451
##########
standalone-metastore/metastore-server/src/main/sql/mssql/hive-schema-4.0.0-beta-2.mssql.sql:
##########
@@ -661,6 +661,8 @@ ALTER TABLE NOTIFICATION_LOG ADD CONSTRAINT
NOTIFICATION_LOG_PK PRIMARY KEY (NL_
CREATE UNIQUE INDEX NOTIFICATION_LOG_EVENT_ID ON NOTIFICATION_LOG (EVENT_ID);
+CREATE UNIQUE INDEX NOTIFICATION_LOG_UNIQUE_DB ON NOTIFICATION_LOG (DB_NAME,
EVENT_ID);
Review Comment:
I agree with @dengzhhu653
Optimizer would check the potential indexes available and decides to choose
one(could be single index or composite index) among them or not to choose any
index based on various factors. Since we have index with single
column(`EVENT_ID`), in query having where clause with eventId + may be dbName +
may be table name etc, optimizer detects the eventId index as potential
candidate. Have captured explain queries on mysql for index used and not used
cases.
Index used:
```
mysql> explain select * from NOTIFICATION_LOG where db_name='mydb' and
event_id>8;
+----+-------------+------------------+------------+-------+---------------------------+---------------------------+---------+------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys
| key | key_len | ref | rows | filtered |
Extra |
+----+-------------+------------------+------------+-------+---------------------------+---------------------------+---------+------+------+----------+------------------------------------+
| 1 | SIMPLE | NOTIFICATION_LOG | NULL | range |
NOTIFICATION_LOG_EVENT_ID | NOTIFICATION_LOG_EVENT_ID | 8 | NULL | 3 |
10.00 | Using index condition; Using where |
+----+-------------+------------------+------------+-------+---------------------------+---------------------------+---------+------+------+----------+------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from NOTIFICATION_LOG where db_name='mydb' and
tbl_name='stage' and event_id>8;
+----+-------------+------------------+------------+-------+---------------------------+---------------------------+---------+------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys
| key | key_len | ref | rows | filtered |
Extra |
+----+-------------+------------------+------------+-------+---------------------------+---------------------------+---------+------+------+----------+------------------------------------+
| 1 | SIMPLE | NOTIFICATION_LOG | NULL | range |
NOTIFICATION_LOG_EVENT_ID | NOTIFICATION_LOG_EVENT_ID | 8 | NULL | 3 |
9.09 | Using index condition; Using where |
+----+-------------+------------------+------------+-------+---------------------------+---------------------------+---------+------+------+----------+------------------------------------+
1 row in set, 1 warning (0.00 sec)
```
Index doesn't seem to be used here:
```
mysql> explain select * from NOTIFICATION_LOG where event_id>12 and
db_name='mydb';
+----+-------------+------------------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys
| key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | NOTIFICATION_LOG | NULL | ALL |
NOTIFICATION_LOG_EVENT_ID | NULL | NULL | NULL | 29 | 5.86 | Using
where |
+----+-------------+------------------+------------+------+---------------------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
```
Have found some reference from mysql but this seem to be not very specific
to mysql.
https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html#:~:text=If%20the%20table%20has%20a%20multiple,of%20(col1%2C%20col2%2C%20col3).
https://dev.mysql.com/doc/refman/8.0/en/index-condition-pushdown-optimization.html#:~:text=Get%20the%20next%20row%27s,on%20the%20test%20result.
Some dbs seem to provide a way to specify index hints in query explicilty to
always use an index.
If we add the dbname + eventId composite index, index updation/maintence
overhead may be high because index data need to be sorted by dbname first and
then by eventIds within the db on the huge notification log? It may improve the
query performance though.
May be good to compare the tradeoffs between using the eventId based index
vs dbname + eventId composite index ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]