[ 
https://issues.apache.org/jira/browse/IMPALA-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416095#comment-17416095
 ] 

Qifan Chen edited comment on IMPALA-10895 at 9/16/21, 12:55 PM:
----------------------------------------------------------------

This bug was found recurring in a core asan test recently. 


{code:java}
custom_cluster/test_query_retries.py:742: in test_retrying_query_cancel
    assert retry_status.group(1) == 'RETRYING'
E   assert 'RETRIED' == 'RETRYING'
E     - RETRIED
E     + RETRYING

SET 
client_identifier=custom_cluster/test_query_retries.py::TestQueryRetries::()::test_retrying_query_cancel;
SET retry_failed_queries=true;
-- executing async: localhost:21000

select count(*) from tpch_parquet.lineitem;
{code}



was (Author: sql_forever):
This bug was found recurring in a core asan test recently. 

custom_cluster/test_query_retries.py:742: in test_retrying_query_cancel
    assert retry_status.group(1) == 'RETRYING'
E   assert 'RETRIED' == 'RETRYING'
E     - RETRIED
E     + RETRYING

SET 
client_identifier=custom_cluster/test_query_retries.py::TestQueryRetries::()::test_retrying_query_cancel;
SET retry_failed_queries=true;
-- executing async: localhost:21000

select count(*) from tpch_parquet.lineitem;

> TestQueryRetries.test_retrying_query_cancel is flaky
> ----------------------------------------------------
>
>                 Key: IMPALA-10895
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10895
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: broken-build
>         Attachments: catalogd.INFO.gz, impalad.INFO.gz, 
> impalad_node1.INFO.gz, impalad_node2.INFO.gz, statestored.INFO.gz
>
>
> Saw this failed in an ASAN build:
> {code}
> custom_cluster.test_query_retries.TestQueryRetries.test_retrying_query_cancel
> {code}
> Stacktrace
> {code}
> custom_cluster/test_query_retries.py:742: in test_retrying_query_cancel
>     assert retry_status.group(1) == 'RETRYING'
> E   assert 'RETRIED' == 'RETRYING'
> E     - RETRIED
> E     + RETRYING
> {code}
> Standard Error
> {code}
> -- 2021-08-29 08:29:29,112 INFO     MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50     
> --statestore_priority_update_frequency_ms=50     
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 
> '--impalad_args=--debug_actions=RETRY_DELAY_CHECKING_ORIGINAL_DRIVER:SLEEP@1000
>  ' '--state_store_args=--statestore_heartbeat_frequency_ms=60000 ' 
> --impalad_args=--default_query_options=
> 08:29:29 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 08:29:29 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 08:29:29 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 08:29:29 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 08:29:29 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 08:29:29 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 08:29:32 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:29:32 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:29:32 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25000
> 08:29:32 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError('<urllib3.connection.HTTPConnection object at 
> 0x7f10c1d5d150>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 08:29:34 MainThread: Debug webpage did not become available in expected time.
> 08:29:34 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 08:29:35 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:29:35 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25000
> 08:29:35 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 08:29:36 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:29:36 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25000
> 08:29:36 MainThread: num_known_live_backends has reached value: 3
> 08:29:37 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:29:37 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25001
> 08:29:37 MainThread: num_known_live_backends has reached value: 3
> 08:29:38 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:29:38 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25002
> 08:29:38 MainThread: num_known_live_backends has reached value: 3
> 08:29:38 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2021-08-29 08:29:38,626 DEBUG    MainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> -- 2021-08-29 08:29:38,626 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25010
> -- 2021-08-29 08:29:38,629 INFO     MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> -- 2021-08-29 08:29:38,629 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25000
> -- 2021-08-29 08:29:38,631 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> -- 2021-08-29 08:29:38,631 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25001
> -- 2021-08-29 08:29:38,633 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> -- 2021-08-29 08:29:38,633 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-061e.vpc.cloudera.com:25002
> -- 2021-08-29 08:29:38,635 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> SET 
> client_identifier=custom_cluster/test_query_retries.py::TestQueryRetries::()::test_retrying_query_cancel;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2021-08-29 08:29:38,801 INFO     MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2021-08-29 08:29:38,822 INFO     MainThread: Closing active operation
> -- 2021-08-29 08:29:38,853 INFO     MainThread: Killing <ImpaladProcess PID: 
> 6501 
> (/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/latest/service/impalad
>  -disconnected_session_timeout 21600 -kudu_client_rpc_timeout_ms 60000 
> -kudu_master_hosts localhost -mem_limit=12884901888 -logbufsecs=5 -v=1 
> -max_log_files=0 -log_filename=impalad_node1 
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/logs/custom_cluster_tests
>  -beeswax_port=21001 -hs2_port=21051 -hs2_http_port=28001 -krpc_port=27001 
> -state_store_subscriber_port=23001 -webserver_port=25001 
> --debug_actions=RETRY_DELAY_CHECKING_ORIGINAL_DRIVER:SLEEP@1000 
> --default_query_options=)> with signal 9
> SET 
> client_identifier=custom_cluster/test_query_retries.py::TestQueryRetries::()::test_retrying_query_cancel;
> SET retry_failed_queries=true;
> -- executing async: localhost:21000
> select count(*) from tpch_parquet.lineitem;
> -- 2021-08-29 08:29:39,647 INFO     MainThread: Started query 
> de46b6f56f935fa5:f4c701b200000000
> -- canceling operation: <tests.common.impala_connection.OperationHandle 
> object at 0x7fb2a8026250>
> {code}
> CC [~xqhe]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to