[ https://issues.apache.org/jira/browse/HIVE-28899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946988#comment-17946988 ]
Jihye Seo edited comment on HIVE-28899 at 4/24/25 9:33 AM: ----------------------------------------------------------- Sharing the test results after a query optimization, which led to a significant reduction in CPU usage—approximately 8x improvement. h3. HMS Environment * Total number of databases: 4,304 ({{{}DBS{}}} table row count) * Total number of tables: 105,350 ({{{}TBLS{}}} table row count) * Executed 30 parallel runs of retrieving metadata for 500 tables each ** Resulting in a total of 15,000 concurrent {{get_table_metas}} calls h3. Before Query Optimization * {*}CPU Usage{*}: * ** ~20.69% * ** !스크린샷 2025-04-24 오후 6.27.31.png|width=516,height=196! * {*}InnoDB Rows Read{*}: * ** Up to 970,000 rows read * ** !image-2025-04-24-18-28-21-577.png|width=520,height=199! h3. After Query Optimization * {*}CPU Usage{*}: * ** ~3.3% * ** !스크린샷 2025-04-24 오후 6.28.44.png|width=493,height=186! * {*}InnoDB Rows Read{*}: * ** Reduced to around 100,000 rows at most * ** !스크린샷 2025-04-24 오후 6.29.42.png|width=472,height=178! was (Author: JIRAUSER298233): Sharing the test results after a query optimization, which led to a significant reduction in CPU usage—approximately 8x improvement. h3. HMS Environment * Total number of databases: 4,304 ({{{}DBS{}}} table row count) * Total number of tables: 105,350 ({{{}TBLS{}}} table row count) * Executed 30 parallel runs of retrieving metadata for 500 tables each ** Resulting in a total of 15,000 concurrent {{get_table_metas}} calls h3. Before Query Optimization * {*}CPU Usage{*}: ** ~20.69% ** !스크린샷 2025-04-24 오후 6.27.31.png|width=516,height=196! * {*}InnoDB Rows Read{*}: ** Up to 970,000 rows read ** !image-2025-04-24-18-28-21-577.png|width=520,height=199! h3. After Query Optimization * {*}CPU Usage{*}: ** ~3.3% ** !스크린샷 2025-04-24 오후 6.28.44.png|width=493,height=186! * {*}InnoDB Rows Read{*}: ** Reduced to around 100,000 rows at most ** !스크린샷 2025-04-24 오후 6.29.42.png|width=472,height=178! > HMS 'get_table_meta' causes full scan on TBLS and increases MySQL CPU usage > --------------------------------------------------------------------------- > > Key: HIVE-28899 > URL: https://issues.apache.org/jira/browse/HIVE-28899 > Project: Hive > Issue Type: Improvement > Affects Versions: 3.1.0 > Reporter: Jihye Seo > Priority: Major > Attachments: image-2025-04-24-18-28-21-577.png, 스크린샷 2025-04-24 > 오후 6.27.31.png, 스크린샷 2025-04-24 오후 6.28.44.png, 스크린샷 > 2025-04-24 오후 6.29.42.png > > > The SQL query generated by Hive Metastore’s {{get_table_meta}} call leads to > a full table scan on the {{TBLS}} table, even though the {{DB_ID}} column is > indexed. This causes serious performance degradation, especially when the > number of tables is large. > In our production environment, this resulted in *high MySQL CPU usage* during > periods of frequent {{get_table_meta}} access. > > Problematic Query: > {code:java} > SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MTable' AS > `NUCLEUS_TYPE`, > A0.CREATE_TIME, > A0.LAST_ACCESS_TIME, > A0.OWNER, > A0.OWNER_TYPE, > A0.RETENTION, > A0.IS_REWRITE_ENABLED, > A0.TBL_NAME, > A0.TBL_TYPE, > A0.WRITE_ID, > A0.TBL_ID > FROM TBLS A0 > LEFT OUTER JOIN DBS B0 ON A0.DB_ID = B0.DB_ID > WHERE B0.CTLG_NAME = 'hive' > AND LOWER(B0.NAME) LIKE 'test' ESCAPE '\\'; {code} > EXPLAIN Result: > {code:java} > +----+-------------+-------+------------+--------+------------------+---------+---------+-------------------+-------+----------+-----------------------+ > | id | select_type | table | partitions | type | possible_keys | key > | key_len | ref | rows | filtered | Extra | > +----+-------------+-------+------------+--------+------------------+---------+---------+-------------------+-------+----------+-----------------------+ > | 1 | SIMPLE | A0 | NULL | ALL | TBLS_N49 | NULL > | NULL | NULL | 76998 | 100.00 | Using temporary | > | 1 | SIMPLE | B0 | NULL | eq_ref | PRIMARY,CTLG_FK1 | PRIMARY > | 8 | hive_c3s.A0.DB_ID | 1 | 98.75 | Using where; Distinct | > +----+-------------+-------+------------+--------+------------------+---------+---------+-------------------+-------+----------+-----------------------+{code} > In the plan above: > * {{TBLS}} performs a *full scan* ({{{}type = ALL{}}}) > * This is because the optimizer is forced to start with {{{}TBLS{}}}, as the > {{DBS.NAME}} filter cannot utilize the index due to the {{LOWER()}} function > Removing {{LOWER()}} Fixes the Issue: > When we modify the query by removing the {{LOWER()}} function, > {code:java} > ... WHERE B0.CTLG_NAME = 'hive' AND B0.NAME LIKE 'test' ESCAPE '\\';{code} > We get the following improved plan. > {code:java} > +----+-------------+-------+------------+-------+----------------------------------+-----------------+---------+-------------------+------+----------+-------------------------------------------+ > | id | select_type | table | partitions | type | possible_keys > | key | key_len | ref | rows | filtered | > Extra | > +----+-------------+-------+------------+-------+----------------------------------+-----------------+---------+-------------------+------+----------+-------------------------------------------+ > | 1 | SIMPLE | B0 | NULL | range | > PRIMARY,UNIQUE_DATABASE,CTLG_FK1 | UNIQUE_DATABASE | 389 | NULL > | 1 | 100.00 | Using where; Using index; Using temporary | > | 1 | SIMPLE | A0 | NULL | ref | TBLS_N49 > | TBLS_N49 | 9 | hive_c3s.B0.DB_ID | 45 | 100.00 | NULL > | > +----+-------------+-------+------------+-------+----------------------------------+-----------------+---------+-------------------+------+----------+-------------------------------------------+{code} > Here: > * The optimizer is able to *start with {{DBS}}* using its {{NAME}} index > * Then it joins to {{TBLS}} using the {{DB_ID}} index, avoiding a full scan > > We’d like to ask if it’s feasible to avoid using the {{LOWER()}} function > when matching database and table name patterns, so that the underlying > database engine (e.g., MySQL) can utilize indexes more effectively. -- This message was sent by Atlassian Jira (v8.20.10#820010)