Quanlong Huang created IMPALA-10895:
---------------------------------------
Summary: TestQueryRetries.test_retrying_query_cancel is flaky
Key: IMPALA-10895
URL: https://issues.apache.org/jira/browse/IMPALA-10895
Project: IMPALA
Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Quanlong Huang
Saw this failed in an ASAN build:
{code}
custom_cluster.test_query_retries.TestQueryRetries.test_retrying_query_cancel
{code}
Stacktrace
{code}
custom_cluster/test_query_retries.py:742: in test_retrying_query_cancel
assert retry_status.group(1) == 'RETRYING'
E assert 'RETRIED' == 'RETRYING'
E - RETRIED
E + RETRYING
{code}
Standard Error
{code}
-- 2021-08-29 08:29:29,112 INFO MainThread: Starting cluster with command:
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
'--state_store_args=--statestore_update_frequency_ms=50
--statestore_priority_update_frequency_ms=50
--statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3
--log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
--log_level=1
'--impalad_args=--debug_actions=RETRY_DELAY_CHECKING_ORIGINAL_DRIVER:SLEEP@1000
' '--state_store_args=--statestore_heartbeat_frequency_ms=60000 '
--impalad_args=--default_query_options=
08:29:29 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
08:29:29 MainThread: Starting State Store logging to
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
08:29:29 MainThread: Starting Catalog Service logging to
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
08:29:29 MainThread: Starting Impala Daemon logging to
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
08:29:29 MainThread: Starting Impala Daemon logging to
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
08:29:29 MainThread: Starting Impala Daemon logging to
/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
08:29:32 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:29:32 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:29:32 MainThread: Getting num_known_live_backends from
impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25000
08:29:32 MainThread: Debug webpage not yet available:
HTTPConnectionPool(host='impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com',
port=25000): Max retries exceeded with url: /backends?json (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7f10c1d5d150>: Failed to establish a new connection: [Errno 111] Connection
refused',))
08:29:34 MainThread: Debug webpage did not become available in expected time.
08:29:34 MainThread: Waiting for num_known_live_backends=3. Current value: None
08:29:35 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:29:35 MainThread: Getting num_known_live_backends from
impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25000
08:29:35 MainThread: Waiting for num_known_live_backends=3. Current value: 0
08:29:36 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:29:36 MainThread: Getting num_known_live_backends from
impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25000
08:29:36 MainThread: num_known_live_backends has reached value: 3
08:29:37 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:29:37 MainThread: Getting num_known_live_backends from
impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25001
08:29:37 MainThread: num_known_live_backends has reached value: 3
08:29:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
08:29:38 MainThread: Getting num_known_live_backends from
impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25002
08:29:38 MainThread: num_known_live_backends has reached value: 3
08:29:38 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3
executors).
-- 2021-08-29 08:29:38,626 DEBUG MainThread: Found 3 impalad/1 statestored/1
catalogd process(es)
-- 2021-08-29 08:29:38,626 INFO MainThread: Getting metric:
statestore.live-backends from
impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25010
-- 2021-08-29 08:29:38,629 INFO MainThread: Metric
'statestore.live-backends' has reached desired value: 4
-- 2021-08-29 08:29:38,629 DEBUG MainThread: Getting num_known_live_backends
from impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25000
-- 2021-08-29 08:29:38,631 INFO MainThread: num_known_live_backends has
reached value: 3
-- 2021-08-29 08:29:38,631 DEBUG MainThread: Getting num_known_live_backends
from impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25001
-- 2021-08-29 08:29:38,633 INFO MainThread: num_known_live_backends has
reached value: 3
-- 2021-08-29 08:29:38,633 DEBUG MainThread: Getting num_known_live_backends
from impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25002
-- 2021-08-29 08:29:38,635 INFO MainThread: num_known_live_backends has
reached value: 3
SET
client_identifier=custom_cluster/test_query_retries.py::TestQueryRetries::()::test_retrying_query_cancel;
-- connecting to: localhost:21000
-- connecting to localhost:21050 with impyla
-- 2021-08-29 08:29:38,801 INFO MainThread: Closing active operation
-- connecting to localhost:28000 with impyla
-- 2021-08-29 08:29:38,822 INFO MainThread: Closing active operation
-- 2021-08-29 08:29:38,853 INFO MainThread: Killing <ImpaladProcess PID:
6501
(/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/latest/service/impalad
-disconnected_session_timeout 21600 -kudu_client_rpc_timeout_ms 60000
-kudu_master_hosts localhost -mem_limit=12884901888 -logbufsecs=5 -v=1
-max_log_files=0 -log_filename=impalad_node1
-log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
-beeswax_port=21001 -hs2_port=21051 -hs2_http_port=28001 -krpc_port=27001
-state_store_subscriber_port=23001 -webserver_port=25001
--debug_actions=RETRY_DELAY_CHECKING_ORIGINAL_DRIVER:SLEEP@1000
--default_query_options=)> with signal 9
SET
client_identifier=custom_cluster/test_query_retries.py::TestQueryRetries::()::test_retrying_query_cancel;
SET retry_failed_queries=true;
-- executing async: localhost:21000
select count(*) from tpch_parquet.lineitem;
-- 2021-08-29 08:29:39,647 INFO MainThread: Started query
de46b6f56f935fa5:f4c701b200000000
-- canceling operation: <tests.common.impala_connection.OperationHandle object
at 0x7fb2a8026250>
{code}
CC [~xqhe]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)