[ 
https://issues.apache.org/jira/browse/IMPALA-11729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17887626#comment-17887626
 ] 

ASF subversion and git services commented on IMPALA-11729:
----------------------------------------------------------

Commit 72aaa6dc27bf32c973055a782aeaa2270c66c038 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=72aaa6dc2 ]

IMPALA-11729: Speed up start-impala-cluster.py

The change reduces cluster startup time by 1-2 seconds. This also
makes custom cluster tests a bit quicker.

Most of the improvement is caused by removing unneeded sleep from
wait_for_catalog() - it also slept after successful connections,
while when the first coordinator is up, it is likely that all
others are also up, meaning 3*0.5s extra sleep in the dev cluster.

Other changes:
- wait_for_catalog is cleaned up and renamed to
  wait_for_coordinator_services
- also wait for hs2_http port to be open
- decreased some sleep intervals
- removed some non-informative logging
- wait for hs2/beeswax/webui ports to be open before trying
  to actually connect to them to avoid extra logging from
  failed Thrift/http connections
- reordered startup to first wait for coordinators to be up
  then wait for num_known_live_backends in each impalad - this
  reflects better what the cluster actually waits for (1st catalog
  update before starting coordinator services)

Change-Id: Ic4dd8c2bc7056443373ceb256a03ce562fea38a0
Reviewed-on: http://gerrit.cloudera.org:8080/21656
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
Reviewed-by: Laszlo Gaal <[email protected]>


> Investigate and improve impalad startup time
> --------------------------------------------
>
>                 Key: IMPALA-11729
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11729
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Csaba Ringhofer
>            Priority: Minor
>              Labels: ramp-up
>
> impalad startup takes several seconds, even few seconds before trying 
> connecting to statestored. From a  test run (release mode) with a parallel 
> catalogd startup:
> {code}
> I1113 21:02:17.334743  4363 logging.cc:247] stdout will be logged to this 
> file.
> I1113 21:02:18.968991  4363 JniFrontend.java:141] Java Input arguments:
> I1113 21:02:19.887519  4363 exec-env.cc:467] Starting statestore subscriber 
> service
> {code}
> After connecting to statestore coordinators need to wait for the initial 
> catalog update and processing it will take time depending on the number of 
> catalog objects:
> {code}
> I1113 21:02:19.888423  4363 Frontend.java:1618] Waiting for local catalog to 
> be initialized, attempt: 0
> I1113 21:02:21.888621  4363 Frontend.java:1618] Waiting for local catalog to 
> be initialized, attempt: 1
> I1113 21:02:23.888849  4363 Frontend.java:1614] Local catalog initialized 
> after: 4000 ms.
> I1113 21:02:23.890105  4363 impala-server.cc:3103] Impala has started.
> {code}
> Meanwhile on catalogd it takes 2 seconds before even trying to connect to HMS:
> {code}
> I1113 21:02:17.289606  4281 logging.cc:247] stdout will be logged to this 
> file.
> I1113 21:02:19.023339  4281 HiveMetaStoreClient.java:720] Trying to connect 
> to metastore with URI (thrift://localhost:9083) in binary transport mode
> I1113 21:02:21.671665  5028 catalog-server.cc:400] A catalog update with 1647 
> entries is assembled. Catalog version: 1649 Last sent catalog version: 0
> {code}
> Statestore starts up quickly, much before other components try to connect to 
> it:
> {code}
> I1113 21:02:17.263167  4262 logging.cc:247] stdout will be logged to this 
> file.
> I1113 21:02:17.268682  4262 thrift-server.cc:419] ThriftServer 
> 'StatestoreService' started on port: 24000
> I1113 21:02:19.670817  4285 TAcceptQueueServer.cpp:355] New connection to 
> server StatestoreService from client <Host: 127.0.0.1 Port: 44156>
> {code}
> While this 6 secs at impalad with ~2 secs waiting for initial catalog update 
> is not very bad, making it quicker would be visible in test run times (custom 
> cluster tests restart the cluster a lot) and in autoscaling scenarios. 
> Finding out what takes the time during startup would be also nice ramp up 
> task.
> The startup logic is single threaded - I see the most potential in moving 
> some independent tasks to separate threads. It is also possible that we are 
> doing some completely unnecessary tasks in some scenarios (e..g executor only 
> impalad) or that some tasks could be safely moved to a later point when they 
> are actually needed.
> Initialization is driven mainly from here:
> https://github.com/apache/impala/blob/master/be/src/service/impalad-main.cc
> https://github.com/apache/impala/blob/master/be/src/catalog/catalogd-main.cc
> but probably most of time is spend in Java code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to