[
https://issues.apache.org/jira/browse/HIVE-25879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480430#comment-17480430
]
Miklos Szurap commented on HIVE-25879:
--------------------------------------
Tested the change on a cluster with 13k databases in DBS. Enabled DEBUG level
logging for the MetastoreDirectSqlUtils class:
{code:java}
2022-01-19 19:06:07,205 DEBUG
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [main]: Direct SQL
query in 14.483568ms + 0.035888ms, the query is [select "DB_ID" from "DBS"]
2022-01-19 19:06:07,297 DEBUG
org.apache.hadoop.hive.metastore.MetaStoreDirectSql: [main]: Using direct SQL,
underlying DB is MYSQL
...
2022-01-19 19:09:20,310 DEBUG
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-2]:
Direct SQL query in 10.274248ms + 0.035611ms, the query is [select "DB_ID" from
"DBS"]
...
2022-01-19 19:14:10,270 DEBUG
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-6]:
Direct SQL query in 116.101829ms + 0.033451ms, the query is [select "DB_ID"
from "DBS"]
...
2022-01-19 19:15:35,195 DEBUG
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-7]:
Direct SQL query in 36.400028ms + 0.025065ms, the query is [select "DB_ID" from
"DBS"] {code}
The average runtime/delay for the test query with the DBS table is 39 ms:
{code:java}
$ cat hadoop-cmf-HIVE-1-HIVEMETASTORE-*.log.out* | grep
"MetastoreDirectSqlUtils.*select \"DB_ID\" from \"DBS\"" | grep -oP "SQL query
in [0-9]+" | awk '{print $NF}' | awk '{sum += $1 } {count+=1 } END { print
"Count: " count ", AVG: " sum/count }'
Count: 497, AVG: 39.6258 {code}
Using VERSION table for the test query:
{code:java}
2022-01-21 12:07:26,091 DEBUG
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-7]:
Direct SQL query in 2.258936ms + 0.007412ms, the query is [select "VER_ID" from
"VERSION"]
...
2022-01-21 12:07:34,688 DEBUG
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-8]:
Direct SQL query in 3.48132ms + 0.013131ms, the query is [select "VER_ID" from
"VERSION"]
...
2022-01-21 12:08:15,329 DEBUG
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-9]:
Direct SQL query in 1.042055ms + 0.005383ms, the query is [select "VER_ID" from
"VERSION"] {code}
The average runtime/delay for the test query with the VERSION table is 3ms:
{code:java}
$ cat hadoop-cmf-HIVE-1-HIVEMETASTORE-*.log.out* | grep
"MetastoreDirectSqlUtils.*select \"VER_ID\" from \"VERSION\"" | grep -oP "SQL
query in [0-9]+" | awk '{print $NF}' | awk '{sum += $1 } {count+=1 } END {
print "Count: " count ", AVG: " sum/count }'
Count: 127, AVG: 3.71654 {code}
Submitted a PR for this.
> MetaStoreDirectSql test query should not query the DBS table
> ------------------------------------------------------------
>
> Key: HIVE-25879
> URL: https://issues.apache.org/jira/browse/HIVE-25879
> Project: Hive
> Issue Type: Bug
> Reporter: Miklos Szurap
> Assignee: Miklos Szurap
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The runTestQuery() in the
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more
> efficienct to query a small table instead, for example the "VERSION" table
> should always have a single row only.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)