[
https://issues.apache.org/jira/browse/IMPALA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon resolved IMPALA-7224.
---------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.1.0
> UpdateCatalogMetrics very slow when there are many tables
> ---------------------------------------------------------
>
> Key: IMPALA-7224
> URL: https://issues.apache.org/jira/browse/IMPALA-7224
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
> Fix For: Impala 3.1.0
>
>
> impalad calls UpdateCatalogMetrics after each statement which is considered a
> DDL. This includes statements like USE, SHOW TABLES, DESCRIBE, etc, which
> don't actually change the number of tables in the catalog, and therefore
> probably don't need to update metrics. That aside, even when the metrics _do_
> need to be updated, the implementation is very slow. It calls getTableNames
> on each database, which results in (a) creating an array of all the names,
> (b) sorting that array and (c) encoding/decoding that whole array into
> Thrift. This is very expensive: on a use case with approximately 8M tables,
> each such call takes 10-12 seconds of CPU, most of which is spent in sorting
> and encoding. All that's really needed is a _count_ of tables, which could be
> fetched directly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]