[ 
https://issues.apache.org/jira/browse/IMPALA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved IMPALA-7224.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0

> UpdateCatalogMetrics very slow when there are many tables
> ---------------------------------------------------------
>
>                 Key: IMPALA-7224
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7224
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>             Fix For: Impala 3.1.0
>
>
> impalad calls UpdateCatalogMetrics after each statement which is considered a 
> DDL. This includes statements like USE, SHOW TABLES, DESCRIBE, etc, which 
> don't actually change the number of tables in the catalog, and therefore 
> probably don't need to update metrics. That aside, even when the metrics _do_ 
> need to be updated, the implementation is very slow. It calls getTableNames 
> on each database, which results in (a) creating an array of all the names, 
> (b) sorting that array and (c) encoding/decoding that whole array into 
> Thrift. This is very expensive: on a use case with approximately 8M tables, 
> each such call takes 10-12 seconds of CPU, most of which is spent in sorting 
> and encoding. All that's really needed is a _count_ of tables, which could be 
> fetched directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to