[ 
https://issues.apache.org/jira/browse/HIVE-25879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480430#comment-17480430
 ] 

Miklos Szurap commented on HIVE-25879:
--------------------------------------

Tested the change on a cluster with 13k databases in DBS. Enabled DEBUG level 
logging for the MetastoreDirectSqlUtils class:
{code:java}
2022-01-19 19:06:07,205 DEBUG 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [main]: Direct SQL 
query in 14.483568ms + 0.035888ms, the query is [select "DB_ID" from "DBS"]
2022-01-19 19:06:07,297 DEBUG 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql: [main]: Using direct SQL, 
underlying DB is MYSQL
...
2022-01-19 19:09:20,310 DEBUG 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-2]: 
Direct SQL query in 10.274248ms + 0.035611ms, the query is [select "DB_ID" from 
"DBS"]
...
2022-01-19 19:14:10,270 DEBUG 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-6]: 
Direct SQL query in 116.101829ms + 0.033451ms, the query is [select "DB_ID" 
from "DBS"]
...
2022-01-19 19:15:35,195 DEBUG 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-7]: 
Direct SQL query in 36.400028ms + 0.025065ms, the query is [select "DB_ID" from 
"DBS"] {code}
The average runtime/delay for the test query with the DBS table is 39 ms:
{code:java}
$ cat hadoop-cmf-HIVE-1-HIVEMETASTORE-*.log.out* | grep 
"MetastoreDirectSqlUtils.*select \"DB_ID\" from \"DBS\"" | grep -oP "SQL query 
in [0-9]+" | awk '{print $NF}' | awk '{sum += $1 } {count+=1 } END { print 
"Count: " count ", AVG: " sum/count }'
Count: 497, AVG: 39.6258 {code}
Using VERSION table for the test query:
{code:java}
2022-01-21 12:07:26,091 DEBUG 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-7]: 
Direct SQL query in 2.258936ms + 0.007412ms, the query is [select "VER_ID" from 
"VERSION"]
...
2022-01-21 12:07:34,688 DEBUG 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-8]: 
Direct SQL query in 3.48132ms + 0.013131ms, the query is [select "VER_ID" from 
"VERSION"]
...
2022-01-21 12:08:15,329 DEBUG 
org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils: [pool-9-thread-9]: 
Direct SQL query in 1.042055ms + 0.005383ms, the query is [select "VER_ID" from 
"VERSION"] {code}
The average runtime/delay for the test query with the VERSION table is 3ms:
{code:java}
$ cat hadoop-cmf-HIVE-1-HIVEMETASTORE-*.log.out* | grep 
"MetastoreDirectSqlUtils.*select \"VER_ID\" from \"VERSION\"" | grep -oP "SQL 
query in [0-9]+" | awk '{print $NF}' | awk '{sum += $1 } {count+=1 } END { 
print "Count: " count ", AVG: " sum/count }'
Count: 127, AVG: 3.71654 {code}
Submitted a PR for this.

> MetaStoreDirectSql test query should not query the DBS table
> ------------------------------------------------------------
>
>                 Key: HIVE-25879
>                 URL: https://issues.apache.org/jira/browse/HIVE-25879
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Miklos Szurap
>            Assignee: Miklos Szurap
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The runTestQuery() in the 
> org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java is using a test query
> {code:java}
> select "DB_ID" from "DBS"{code}
> to determine whether the direct SQL can be used.
> With larger deployments with many (10k+) Hive databases it would be more 
> efficienct to query a small table instead, for example the "VERSION" table 
> should always have a single row only.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to