[
https://issues.apache.org/jira/browse/IMPALA-11729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887626#comment-17887626
]
ASF subversion and git services commented on IMPALA-11729:
----------------------------------------------------------
Commit 72aaa6dc27bf32c973055a782aeaa2270c66c038 in impala's branch
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=72aaa6dc2 ]
IMPALA-11729: Speed up start-impala-cluster.py
The change reduces cluster startup time by 1-2 seconds. This also
makes custom cluster tests a bit quicker.
Most of the improvement is caused by removing unneeded sleep from
wait_for_catalog() - it also slept after successful connections,
while when the first coordinator is up, it is likely that all
others are also up, meaning 3*0.5s extra sleep in the dev cluster.
Other changes:
- wait_for_catalog is cleaned up and renamed to
wait_for_coordinator_services
- also wait for hs2_http port to be open
- decreased some sleep intervals
- removed some non-informative logging
- wait for hs2/beeswax/webui ports to be open before trying
to actually connect to them to avoid extra logging from
failed Thrift/http connections
- reordered startup to first wait for coordinators to be up
then wait for num_known_live_backends in each impalad - this
reflects better what the cluster actually waits for (1st catalog
update before starting coordinator services)
Change-Id: Ic4dd8c2bc7056443373ceb256a03ce562fea38a0
Reviewed-on: http://gerrit.cloudera.org:8080/21656
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Reviewed-by: Laszlo Gaal <[email protected]>
> Investigate and improve impalad startup time
> --------------------------------------------
>
> Key: IMPALA-11729
> URL: https://issues.apache.org/jira/browse/IMPALA-11729
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Csaba Ringhofer
> Priority: Minor
> Labels: ramp-up
>
> impalad startup takes several seconds, even few seconds before trying
> connecting to statestored. From a test run (release mode) with a parallel
> catalogd startup:
> {code}
> I1113 21:02:17.334743 4363 logging.cc:247] stdout will be logged to this
> file.
> I1113 21:02:18.968991 4363 JniFrontend.java:141] Java Input arguments:
> I1113 21:02:19.887519 4363 exec-env.cc:467] Starting statestore subscriber
> service
> {code}
> After connecting to statestore coordinators need to wait for the initial
> catalog update and processing it will take time depending on the number of
> catalog objects:
> {code}
> I1113 21:02:19.888423 4363 Frontend.java:1618] Waiting for local catalog to
> be initialized, attempt: 0
> I1113 21:02:21.888621 4363 Frontend.java:1618] Waiting for local catalog to
> be initialized, attempt: 1
> I1113 21:02:23.888849 4363 Frontend.java:1614] Local catalog initialized
> after: 4000 ms.
> I1113 21:02:23.890105 4363 impala-server.cc:3103] Impala has started.
> {code}
> Meanwhile on catalogd it takes 2 seconds before even trying to connect to HMS:
> {code}
> I1113 21:02:17.289606 4281 logging.cc:247] stdout will be logged to this
> file.
> I1113 21:02:19.023339 4281 HiveMetaStoreClient.java:720] Trying to connect
> to metastore with URI (thrift://localhost:9083) in binary transport mode
> I1113 21:02:21.671665 5028 catalog-server.cc:400] A catalog update with 1647
> entries is assembled. Catalog version: 1649 Last sent catalog version: 0
> {code}
> Statestore starts up quickly, much before other components try to connect to
> it:
> {code}
> I1113 21:02:17.263167 4262 logging.cc:247] stdout will be logged to this
> file.
> I1113 21:02:17.268682 4262 thrift-server.cc:419] ThriftServer
> 'StatestoreService' started on port: 24000
> I1113 21:02:19.670817 4285 TAcceptQueueServer.cpp:355] New connection to
> server StatestoreService from client <Host: 127.0.0.1 Port: 44156>
> {code}
> While this 6 secs at impalad with ~2 secs waiting for initial catalog update
> is not very bad, making it quicker would be visible in test run times (custom
> cluster tests restart the cluster a lot) and in autoscaling scenarios.
> Finding out what takes the time during startup would be also nice ramp up
> task.
> The startup logic is single threaded - I see the most potential in moving
> some independent tasks to separate threads. It is also possible that we are
> doing some completely unnecessary tasks in some scenarios (e..g executor only
> impalad) or that some tasks could be safely moved to a later point when they
> are actually needed.
> Initialization is driven mainly from here:
> https://github.com/apache/impala/blob/master/be/src/service/impalad-main.cc
> https://github.com/apache/impala/blob/master/be/src/catalog/catalogd-main.cc
> but probably most of time is spend in Java code
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]