[
https://issues.apache.org/jira/browse/HIVE-26649?focusedWorklogId=837785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-837785
]
ASF GitHub Bot logged work on HIVE-26649:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Jan/23 00:20
Start Date: 09/Jan/23 00:20
Worklog Time Spent: 10m
Work Description: github-actions[bot] closed pull request #3714:
[WIP][HIVE-26649] [Metastore] Hive metabase performance issues due to slow…
URL: https://github.com/apache/hive/pull/3714
Issue Time Tracking
-------------------
Worklog Id: (was: 837785)
Remaining Estimate: 95h 10m (was: 95h 20m)
Time Spent: 50m (was: 40m)
> Hive metabase performance issues due to slow queries
> ----------------------------------------------------
>
> Key: HIVE-26649
> URL: https://issues.apache.org/jira/browse/HIVE-26649
> Project: Hive
> Issue Type: Improvement
> Components: Metastore
> Affects Versions: 2.3.3
> Environment: metastore db :mysql 5.X
> hive:2.3.3
> Reporter: yihangqiao
> Assignee: yihangqiao
> Priority: Major
> Labels: metastore, patch, performance, pull-request-available
> Attachments: image-2022-10-19-14-42-33-073.png
>
> Original Estimate: 96h
> Time Spent: 50m
> Remaining Estimate: 95h 10m
>
> When the Hive metabase uses Mysql, during the peak period of Hive statement
> query, the metastore initiates a large amount of DirectSQL, which will cause
> performance problems in the metabase. The fundamental reason is that some
> DirectSQL performance problems cause a large number of slow queries at the DB
> level.
> For example for the following Hive query:
> {code:java}
> select * from imd_fcac_safe.fcac_dw_loan_details where ds='2021-10-10' and
> sysid='MCFCM' {code}
> where ds and sysid are the primary and secondary partitions of the
> imd_fcac_safe.fcac_dw_loan_details table, respectively
>
> The Hive statement will generate the DirectSQL query as follows:
> {code:java}
> explain select PARTITIONS.PART_ID from PARTITIONS inner join TBLS on
> PARTITIONS.TBL_ID = TBLS.TBL_ID and TBLS.TBL_NAME =
> 'fcac_dw_loan_details' inner join DBS on TBLS.DB_ID = DBS.DB_ID and
> DBS.NAME = 'imd_fcac_safe' inner join PARTITION_KEY_VALS FILTER0 on
> FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 inner join
> PARTITION_KEY_VALS FILTER1 on FILTER1.PART_ID = PARTITIONS.PART_ID and
> FILTER1.INTEGER_IDX = 1 where ( ((FILTER0.PART_KEY_VAL = '2021-10-10') and
> (FILTER1.PART_KEY_VAL = 'MCFCM')) ) {code}
> !image-2022-10-19-14-42-33-073.png!
>
> Problems with this statement
> There is no TBL_ID field in the PARTITION_KEY_VALS table, which will cause
> the partition of the same name of the unrelated table to be described when
> performing an associated query; there is no index column in the
> PARTITION_KEY_VAL table, so it cannot be accelerated by the index.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)