[ 
https://issues.apache.org/jira/browse/IMPALA-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896665#comment-16896665
 ] 

Tim Armstrong commented on IMPALA-8816:
---------------------------------------

A large part of the problem is that start-impala-cluster.py is not designed to 
support these kind of startup failure tests. Some startup failure tests end up 
waiting for CLUSTER_WAIT_TIMEOUT_IN_SECONDS for a metric to increment. This is 
non-deterministic. It seems like much of the time the tests fail with a shorter 
timeout when processes fail to appear here - 
https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/tests/common/impala_cluster.py#L166.
 But unlucky ones get stuck waiting for the longer metric timeout.
{noformat}
custom_cluster.test_query_event_hooks.TestHooksStartupFail.test_hook_startup_fail
 (from pytest)
Took 4 min 14 sec.
add description
Standard Error

-- 2019-07-30 22:53:23,755 INFO     MainThread: Starting cluster with command: 
/home/ubuntu/Impala/bin/start-impala-cluster.py --cluster_size=1 
--num_coordinators=1 
--log_dir=/home/ubuntu/Impala/logs/custom_cluster_tests/test_hooks_startup_fail_uLTriL
 --log_level=1 
'--impalad_args=--query_event_hook_classes=org.apache.impala.testutil.AlwaysErrorQueryEventHook
 --minidump_path=/tmp/tmpBDmylN ' 
'--state_store_args=--statestore_update_frequency_ms=50     
--statestore_priority_update_frequency_ms=50 
...[truncated 14624 chars]...
ome/ubuntu/Impala/tests/common/impala_cluster.py", line 174, in wait_until_ready
    timeout=CLUSTER_WAIT_TIMEOUT_IN_SECONDS, interval=2)
  File "/home/ubuntu/Impala/tests/common/impala_service.py", line 270, in 
wait_for_num_known_live_backends
    assert 0, 'num_known_live_backends did not reach expected value in time'
AssertionError: num_known_live_backends did not reach expected value in time
-- 2019-07-30 22:57:29,083 DEBUG    MainThread: Found 0 impalad/1 statestored/1 
catalogd process(es)
{noformat}

I think we could fix this entirely by specialising start-impala-cluster to have 
an "expected startup failure mode" that waits for the number of impalads to hit 
0, which should be quick, then throws an exception.

> custom cluster tests in precommit are taking close to 2 hours
> -------------------------------------------------------------
>
>                 Key: IMPALA-8816
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8816
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 3.3.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>
> This is affecting precommit times substantially. We should either speed up 
> the tests or, more likely, move some to exhaustive.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to