[
https://issues.apache.org/jira/browse/IMPALA-11729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Smith resolved IMPALA-11729.
------------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
> Investigate and improve impalad startup time
> --------------------------------------------
>
> Key: IMPALA-11729
> URL: https://issues.apache.org/jira/browse/IMPALA-11729
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Csaba Ringhofer
> Priority: Minor
> Labels: ramp-up
> Fix For: Impala 4.5.0
>
>
> impalad startup takes several seconds, even few seconds before trying
> connecting to statestored. From a test run (release mode) with a parallel
> catalogd startup:
> {code}
> I1113 21:02:17.334743 4363 logging.cc:247] stdout will be logged to this
> file.
> I1113 21:02:18.968991 4363 JniFrontend.java:141] Java Input arguments:
> I1113 21:02:19.887519 4363 exec-env.cc:467] Starting statestore subscriber
> service
> {code}
> After connecting to statestore coordinators need to wait for the initial
> catalog update and processing it will take time depending on the number of
> catalog objects:
> {code}
> I1113 21:02:19.888423 4363 Frontend.java:1618] Waiting for local catalog to
> be initialized, attempt: 0
> I1113 21:02:21.888621 4363 Frontend.java:1618] Waiting for local catalog to
> be initialized, attempt: 1
> I1113 21:02:23.888849 4363 Frontend.java:1614] Local catalog initialized
> after: 4000 ms.
> I1113 21:02:23.890105 4363 impala-server.cc:3103] Impala has started.
> {code}
> Meanwhile on catalogd it takes 2 seconds before even trying to connect to HMS:
> {code}
> I1113 21:02:17.289606 4281 logging.cc:247] stdout will be logged to this
> file.
> I1113 21:02:19.023339 4281 HiveMetaStoreClient.java:720] Trying to connect
> to metastore with URI (thrift://localhost:9083) in binary transport mode
> I1113 21:02:21.671665 5028 catalog-server.cc:400] A catalog update with 1647
> entries is assembled. Catalog version: 1649 Last sent catalog version: 0
> {code}
> Statestore starts up quickly, much before other components try to connect to
> it:
> {code}
> I1113 21:02:17.263167 4262 logging.cc:247] stdout will be logged to this
> file.
> I1113 21:02:17.268682 4262 thrift-server.cc:419] ThriftServer
> 'StatestoreService' started on port: 24000
> I1113 21:02:19.670817 4285 TAcceptQueueServer.cpp:355] New connection to
> server StatestoreService from client <Host: 127.0.0.1 Port: 44156>
> {code}
> While this 6 secs at impalad with ~2 secs waiting for initial catalog update
> is not very bad, making it quicker would be visible in test run times (custom
> cluster tests restart the cluster a lot) and in autoscaling scenarios.
> Finding out what takes the time during startup would be also nice ramp up
> task.
> The startup logic is single threaded - I see the most potential in moving
> some independent tasks to separate threads. It is also possible that we are
> doing some completely unnecessary tasks in some scenarios (e..g executor only
> impalad) or that some tasks could be safely moved to a later point when they
> are actually needed.
> Initialization is driven mainly from here:
> https://github.com/apache/impala/blob/master/be/src/service/impalad-main.cc
> https://github.com/apache/impala/blob/master/be/src/catalog/catalogd-main.cc
> but probably most of time is spend in Java code
--
This message was sent by Atlassian Jira
(v8.20.10#820010)