[
https://issues.apache.org/jira/browse/DRILL-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228774#comment-15228774
]
ASF GitHub Bot commented on DRILL-4577:
---------------------------------------
Github user vkorukanti commented on a diff in the pull request:
https://github.com/apache/drill/pull/461#discussion_r58753354
--- Diff:
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/schema/HiveDatabaseSchema.java
---
@@ -72,4 +80,76 @@ public String getTypeName() {
return HiveStoragePluginConfig.NAME;
}
+ @Override
+ public List<Pair<String, ? extends Table>> getTablesByNames(final
List<String> tableNames) {
+ final String schemaName = getName();
+ final List<Pair<String, ? extends Table>> tableNameToTable =
Lists.newArrayList();
+ List<org.apache.hadoop.hive.metastore.api.Table> tables;
+ // Retries once if the first call to fetch the metadata fails
+ synchronized(mClient) {
+ final List<String> tableNamesWithAuth = Lists.newArrayList();
+ for(String tableName : tableNames) {
+ try {
+ if(mClient.tableExists(schemaName, tableName)) {
--- End diff --
Here you are making a RPC call for every table. I thought for perf reasons
we wanted to avoid the RPC call per table and instead use
```getTableObjectsByName``` to get all tables data in one RPC call. How does
this patch improve the perf?
> Improve performance for query on INFORMATION_SCHEMA when HIVE is plugged in
> ---------------------------------------------------------------------------
>
> Key: DRILL-4577
> URL: https://issues.apache.org/jira/browse/DRILL-4577
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Hive
> Reporter: Sean Hsuan-Yi Chu
> Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> A query such as
> {code}
> select * from INFORMATION_SCHEMA.`TABLES`
> {code}
> is converted as calls to fetch all tables from storage plugins.
> When users have Hive, the calls to hive metadata storage would be:
> 1) get_table
> 2) get_partitions
> However, the information regarding partitions is not used in this type of
> queries. Beside, a more efficient way is to fetch tables is to use
> get_multi_table call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)