[
https://issues.apache.org/jira/browse/IMPALA-8816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896665#comment-16896665
]
Tim Armstrong commented on IMPALA-8816:
---------------------------------------
A large part of the problem is that start-impala-cluster.py is not designed to
support these kind of startup failure tests. Some startup failure tests end up
waiting for CLUSTER_WAIT_TIMEOUT_IN_SECONDS for a metric to increment. This is
non-deterministic. It seems like much of the time the tests fail with a shorter
timeout when processes fail to appear here -
https://github.com/apache/impala/blob/72c9370856d7436885adbee3e8da7e7d9336df15/tests/common/impala_cluster.py#L166.
But unlucky ones get stuck waiting for the longer metric timeout.
{noformat}
custom_cluster.test_query_event_hooks.TestHooksStartupFail.test_hook_startup_fail
(from pytest)
Took 4 min 14 sec.
add description
Standard Error
-- 2019-07-30 22:53:23,755 INFO MainThread: Starting cluster with command:
/home/ubuntu/Impala/bin/start-impala-cluster.py --cluster_size=1
--num_coordinators=1
--log_dir=/home/ubuntu/Impala/logs/custom_cluster_tests/test_hooks_startup_fail_uLTriL
--log_level=1
'--impalad_args=--query_event_hook_classes=org.apache.impala.testutil.AlwaysErrorQueryEventHook
--minidump_path=/tmp/tmpBDmylN '
'--state_store_args=--statestore_update_frequency_ms=50
--statestore_priority_update_frequency_ms=50
...[truncated 14624 chars]...
ome/ubuntu/Impala/tests/common/impala_cluster.py", line 174, in wait_until_ready
timeout=CLUSTER_WAIT_TIMEOUT_IN_SECONDS, interval=2)
File "/home/ubuntu/Impala/tests/common/impala_service.py", line 270, in
wait_for_num_known_live_backends
assert 0, 'num_known_live_backends did not reach expected value in time'
AssertionError: num_known_live_backends did not reach expected value in time
-- 2019-07-30 22:57:29,083 DEBUG MainThread: Found 0 impalad/1 statestored/1
catalogd process(es)
{noformat}
I think we could fix this entirely by specialising start-impala-cluster to have
an "expected startup failure mode" that waits for the number of impalads to hit
0, which should be quick, then throws an exception.
> custom cluster tests in precommit are taking close to 2 hours
> -------------------------------------------------------------
>
> Key: IMPALA-8816
> URL: https://issues.apache.org/jira/browse/IMPALA-8816
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: Impala 3.3.0
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Major
>
> This is affecting precommit times substantially. We should either speed up
> the tests or, more likely, move some to exhaustive.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]