[
https://issues.apache.org/jira/browse/DRILL-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036677#comment-15036677
]
Jinfeng Ni commented on DRILL-4127:
-----------------------------------
For a hive storage plugin with about 8 schema/databases, if I run a simple
query like this:
select count(*) from hive.table1;
>From hive.log, we saw that the # of hive metastore api call as following:
Without the patch. Impersonation is turned on.
1. # of get_all_databases API call: 31
2. # of get_all_tables API call: 30
3. # of get_table API call: 2
That explains that why some Drill users report that they saw Drill spent 20-30
seconds on planning for such simple query, making the query not "interactive"
at all.
> HiveSchema.getSubSchema() should use lazy loading of all the table names
> ------------------------------------------------------------------------
>
> Key: DRILL-4127
> URL: https://issues.apache.org/jira/browse/DRILL-4127
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
>
> Currently, HiveSchema.getSubSchema() will pre-load all the table names when
> it constructs the subschema, even though those tables names are not requested
> at all. This could cause considerably big performance overhead, especially
> when the hive schema contains large # of objects (thousands of tables/views
> are not un-common in some use case).
> In stead, we should change the loading of table names to on-demand. Only when
> there is a request of get all table names, we load them into hive schema.
> This should help "show schemas", since it only requires the schema name, not
> the table names in the schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)