[
https://issues.apache.org/jira/browse/DRILL-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dechang Gu closed DRILL-4126.
-----------------------------
Verified with Apache Drill 1.5.0 (git id 3f228d3) against the commit (git id
539cbba) prior to the patch, querying INFORTION_SCHEMA. Significant reduction
in the function calls to HIVE API.
Before the patch (git id 539cbba):
-- get_all_databases was called 340 times
-- get_all_tables was called 336 times.
with the patch (AD 1.5.0 git id 3f228d3), for the same query and same databases:
-- get_all_databases was only called 2 times, and
-- get_all_tables was called 38 times.
So the fixed LGTM, and the jira is closed.
> Adding HiveMetaStore caching when impersonation is enabled.
> ------------------------------------------------------------
>
> Key: DRILL-4126
> URL: https://issues.apache.org/jira/browse/DRILL-4126
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
> Fix For: 1.5.0
>
>
> Currently, HiveMetastore caching is used only when impersonation is disabled,
> such that all the hivemetastore call goes through
> NonCloseableHiveClientWithCaching [1]. However, if impersonation is enabled,
> caching is not used for HiveMetastore access.
> This could significantly increase the planning time when hive storage plugin
> is enabled, or when running a query against INFORMATION_SCHEMA. Depending on
> the # of databases/tables in Hive storage plugin, the planning time or
> INFORMATION_SCHEMA query could become unacceptable. This becomes even worse
> if the hive metastore is running on a different node from drillbit, making
> the access of hivemetastore even slower.
> We are seeing that it could takes 30~60 seconds for planning time, or
> execution time for INFORMATION_SCHEMA query. The long planning or execution
> time for INFORMATION_SCHEMA query prevents Drill from acting "interactively"
> for such queries.
> We should enable caching when impersonation is used. As long as the
> authorizer verifies the user has the access to databases/tables, we should
> get the data from caching. By doing that, we should see reduced number of api
> call to HiveMetaStore.
> [1]
> https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/DrillHiveMetaStoreClient.java#L299
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)