[
https://issues.apache.org/jira/browse/IMPALA-14602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042827#comment-18042827
]
Jason Fehr commented on IMPALA-14602:
-------------------------------------
This test failure is extremely rare considering this test has been run in ASAN
builds for 3 months. The "test_cancel_during_admission_control" test works by
starting a long running query which takes the only available admission control
execution slot followed by starting a second query that will fail since there
are no admission control slots available. Since the test failed with "expected
child spans count: 5, actual: 6", that indicates the second query that should
have been blocked at admission control actually did not get blocked. If the
query had been blocked, then only 5 spans would have been present in the trace.
Instead, the 6th QueryExecution span was present indicating the second query
actually executed before the first query started.
The issue seems to be because the test does not wait for the first long-running
query to start. It async runs the first query followed by immediately starting
the second query.
> test_cancel_during_admission_control is flaky
> ---------------------------------------------
>
> Key: IMPALA-14602
> URL: https://issues.apache.org/jira/browse/IMPALA-14602
> Project: IMPALA
> Issue Type: Bug
> Components: Test
> Affects Versions: Impala 5.0.0
> Reporter: Riza Suminto
> Assignee: Jason Fehr
> Priority: Major
> Labels: broken-build
>
> custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::test_cancel_during_admission_control
> is flaky in downstream build with following error.
> {code:java}
> Error Message
> AssertionError: Trace '4981634b898085aa804e80969f0e66c7' expected child spans
> count: 5, actual: 6
> Stacktrace
> custom_cluster/test_otel_trace.py:511: in test_cancel_during_admission_control
> self.assert_trace(
> custom_cluster/test_otel_trace.py:53: in assert_trace
> assert_trace(self.build_log_path("impalad", "INFO"), self.trace_file_path,
> util/otel_trace.py:382: in assert_trace
> __assert_trace_common(trace, expected_span_count)
> util/otel_trace.py:485: in __assert_trace_common
> assert len(trace.child_spans) == expected_child_spans_count, \
> E AssertionError: Trace '4981634b898085aa804e80969f0e66c7' expected child
> spans count: 5, actual: 6
> Standard Error
> -- 2025-12-02 16:13:37,819 INFO MainThread: Created temporary dir
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impala_test_out_dir_traces_jkumhes1
> -- 2025-12-02 16:13:37,819 INFO MainThread: Starting cluster with
> command:
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/bin/start-impala-cluster.py
> '--state_store_args=--statestore_update_frequency_ms=50
> --statestore_priority_update_frequency_ms=50
> --statestore_heartbeat_frequency_ms=50' --cluster_size=1 --num_coordinators=1
> --log_dir=/data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests
> --log_level=1 '--impalad_args=--logbuflevel=-1 -v=2
> --cluster_id=cancel_adm_ctl --default_pool_max_requests=1
> --otel_trace_enabled=true --otel_trace_exporter=file
> --otel_trace_exhaustive_dchecks --otel_file_flush_interval_ms=500
> --otel_file_pattern=/data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impala_test_out_dir_traces_jkumhes1/export-trace.jsonl
> ' '--state_store_args=--logbuflevel=-1 ' '--catalogd_args=--logbuflevel=-1
> ' '--admissiond_args=--logbuflevel=-1 '
> --impalad_args=--default_query_options=
> 16:13:38 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 16:13:38 MainThread: Starting State Store logging to
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 16:13:39 MainThread: Starting Catalog Service logging to
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 16:13:40 MainThread: Starting Impala Daemon logging to
> /data/jenkins/workspace/impala-cdw-master-staging-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 16:13:42 MainThread: Found 1 impalad/1 statestored/1 catalogd process(es)
> 16:13:42 MainThread: Waiting for Impalad webserver port 25000
> 16:13:43 MainThread: Waiting for Impalad webserver port 25000
> 16:13:43 MainThread: Waiting for Impalad webserver port 25000
> 16:13:44 MainThread: Waiting for Impalad webserver port 25000
> 16:13:46 MainThread: Waiting for coordinator client services - hs2 port:
> 21050 hs2-http port: 28000 beeswax port: 21000
> 16:13:46 MainThread: Getting num_known_live_backends from
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:25000
> 16:13:46 MainThread: num_known_live_backends has reached value: 1
> 16:13:46 MainThread: Total wait: 3.65s
> 16:13:46 MainThread: Impala Cluster Running with 1 nodes (1 coordinators, 1
> executors).
> -- 2025-12-02 16:13:46,212 DEBUG MainThread: Found 1 impalad/1
> statestored/1 catalogd process(es)
> -- 2025-12-02 16:13:46,212 INFO MainThread: Getting metric:
> statestore.live-backends from
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:25010
> -- 2025-12-02 16:13:46,218 INFO MainThread: Metric
> 'statestore.live-backends' has reached desired value: 2. total_wait: 0s
> -- 2025-12-02 16:13:46,218 DEBUG MainThread: Getting
> num_known_live_backends from
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:25000
> -- 2025-12-02 16:13:46,222 INFO MainThread: num_known_live_backends has
> reached value: 1
> -- 2025-12-02 16:13:46,223 INFO MainThread: beeswax:
> set
> client_identifier=custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::()::test_cancel_during_admission_control;
> -- 2025-12-02 16:13:46,223 INFO MainThread: beeswax: connected to
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21000 with
> beeswax
> -- 2025-12-02 16:13:46,223 INFO MainThread: hs2:
> set
> client_identifier=custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::()::test_cancel_during_admission_control;
> -- 2025-12-02 16:13:46,223 INFO MainThread: hs2: connected to
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21050 with
> impyla hs2
> -- 2025-12-02 16:13:46,223 INFO MainThread: hs2-http:
> set
> client_identifier=custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::()::test_cancel_during_admission_control;
> -- 2025-12-02 16:13:46,225 INFO MainThread: hs2-http: connected to
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:28000 with
> impyla hs2-http
> -- 2025-12-02 16:13:46,225 INFO MainThread: hs2-feng:
> set
> client_identifier=custom_cluster/test_otel_trace.py::TestOtelTraceSelectQueued::()::test_cancel_during_admission_control;
> -- 2025-12-02 16:13:46,225 INFO MainThread: hs2-feng: connected to
> impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:11050 with
> impyla hs2-feng
> -- 2025-12-02 16:13:46,230 INFO MainThread: hs2: executing against Impala
> at impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21050.
> session: bd44d4109d8f25f7:1150444731f78e98 main_cursor: False user: None
> SELECT * FROM functional.alltypes WHERE id = SLEEP(5000);
> -- 2025-12-02 16:13:47,578 INFO MainThread: hs2: executing against Impala
> at impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21050.
> session: 2b4fa7e01448a74b:3d59c1baf1f27a87 main_cursor: False user: None
> SELECT * FROM functional.alltypes;
> -- 2025-12-02 16:13:47,608 INFO MainThread:
> cc4a90f7c66ca08f:4b6e0abb00000000: getting state
> -- 2025-12-02 16:13:47,609 INFO MainThread: hs2: executing against Impala
> at impala-ec2-rhel92-m7g-4xlarge-ondemand-1b73.vpc.cloudera.com:21050.
> session: 424e33f2c059b306:d5918c0e6270d6b4 main_cursor: True user: None
> KILL QUERY 'cc4a90f7c66ca08f:4b6e0abb00000000';
> -- 2025-12-02 16:13:47,624 INFO MainThread:
> d04417f3822cfb69:986d6a2c00000000: query started
> -- 2025-12-02 16:13:47,626 INFO MainThread:
> d04417f3822cfb69:986d6a2c00000000: getting log for operation
> -- 2025-12-02 16:13:47,627 INFO MainThread:
> d04417f3822cfb69:986d6a2c00000000: getting runtime profile operation
> -- 2025-12-02 16:13:47,627 INFO MainThread:
> d04417f3822cfb69:986d6a2c00000000: closing query for operation
> -- 2025-12-02 16:13:47,628 INFO MainThread:
> cc4a90f7c66ca08f:4b6e0abb00000000: getting state
> -- 2025-12-02 16:13:47,628 INFO MainThread:
> cc4a90f7c66ca08f:4b6e0abb00000000: getting runtime profile operation
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]