[ 
https://issues.apache.org/jira/browse/IMPALA-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846248#comment-16846248
 ] 

Bikramjeet Vig commented on IMPALA-8570:
----------------------------------------

Based on the logs it seems like the query is scheduled to run on only 2 out of 
the 3 backends
{noformat}
20:10:07.493836 24877 coordinator.cc:355] a745b14e6a951582:ef6b8c9600000000] 
starting execution on 2 backends for query_id=a745b14e6a951582:ef6b8c9600000000
{noformat}
and the impalad which is shutdown as a part of the test is the one which is not 
running the query, so the query never gets cancelled. The test was written 
assuming the query would always be scheduled on all 3 backends, but that 
assumption does not hold true when the test is run on a non standard 
minicluster config (which in this case is because the test ran on a minicluster 
with erasure coding turned on). Best solution here is to make sure it only runs 
on a regular minicluster.

> test_restart_statestore_query_resilience AssertionError: Query expected to 
> fail
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-8570
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8570
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 3.3.0
>            Reporter: Thomas Tauber-Marshall
>            Assignee: Bikramjeet Vig
>            Priority: Blocker
>
> Seen in an erasure coding build
> {noformat}
> Error Message
> AssertionError: Query expected to fail assert False
> Stacktrace
> custom_cluster/test_restart_services.py:129: in 
> test_restart_statestore_query_resilience
>     assert False, "Query expected to fail"
> E   AssertionError: Query expected to fail
> E   assert False
> Standard Output
> Redirecting stdout to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> Redirecting stdout to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> Standard Error
> -- 2019-05-21 20:09:58,348 INFO     MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py
>  --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--statestore_subscriber_timeout_seconds=2 
> --failed_backends_query_cancellation_grace_period_ms=5000 ' 
> '--state_store_args=--statestore_update_frequency_ms=50     
> --statestore_priority_update_frequency_ms=50     
> --statestore_heartbeat_frequency_ms=50 ' 
> '--catalogd_args=--statestore_subscriber_timeout_seconds=2 ' 
> --impalad_args=--default_query_options=allow_erasure_coded_files=true
> 20:09:58 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 20:09:58 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 20:09:58 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 20:09:58 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 20:09:58 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 20:10:01 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 20:10:01 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25000
> 20:10:01 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 20:10:02 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25000
> 20:10:02 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 20:10:03 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25000
> 20:10:03 MainThread: num_known_live_backends has reached value: 3
> 20:10:04 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25001
> 20:10:04 MainThread: num_known_live_backends has reached value: 3
> 20:10:05 MainThread: Getting num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25002
> 20:10:05 MainThread: num_known_live_backends has reached value: 3
> 20:10:05 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2019-05-21 20:10:05,585 INFO     MainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> -- 2019-05-21 20:10:05,585 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:05,587 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:05,588 INFO     MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> -- 2019-05-21 20:10:05,588 INFO     MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25000
> -- 2019-05-21 20:10:05,589 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:05,590 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> -- 2019-05-21 20:10:05,590 INFO     MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25001
> -- 2019-05-21 20:10:05,591 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:05,592 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> -- 2019-05-21 20:10:05,592 INFO     MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25002
> -- 2019-05-21 20:10:05,592 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:05,593 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> SET 
> client_identifier=custom_cluster/test_restart_services.py::TestRestart::()::test_restart_statestore_query_resilience;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2019-05-21 20:10:05,749 INFO     MainThread: Closing active operation
> SET 
> client_identifier=custom_cluster/test_restart_services.py::TestRestart::()::test_restart_statestore_query_resilience;
> -- connecting to: 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:21000
> -- executing async: 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:21000
> select distinct * from tpch_parquet.lineitem where l_orderkey > sleep(1000);
> -- 2019-05-21 20:10:07,491 INFO     MainThread: Started query 
> a745b14e6a951582:ef6b8c9600000000
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> -- 2019-05-21 20:10:08,518 INFO     MainThread: Killing <StateStoreProcess 
> PID: 23688 
> (/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/statestored
>  -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored 
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
>  --statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50)> with signal 9
> -- 2019-05-21 20:10:08,541 INFO     MainThread: No PID found for process 
> cmdline: 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/statestored
>  -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored 
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
>  --statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50. Process is dead?
> -- 2019-05-21 20:10:08,541 INFO     MainThread: Waiting for exit: 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/statestored
>  -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored 
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
>  --statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50 (PID: None)
> -- 2019-05-21 20:10:08,552 INFO     MainThread: Starting statestored with 
> arguments
> -- 2019-05-21 20:10:08,560 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:08,561 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:08,561 INFO     MainThread: Debug webpage not yet 
> available: ('Connection aborted.', error(111, 'Connection refused'))
> Turning perftools heap leak checking off
> Redirecting stderr to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.ERROR
> -- 2019-05-21 20:10:09,564 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:09,565 INFO     MainThread: Waiting for metric value 
> 'statestore.live-backends'=4. Current value: 0
> -- 2019-05-21 20:10:09,565 INFO     MainThread: Sleeping 1s before next retry.
> -- 2019-05-21 20:10:10,566 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:10,567 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:10,568 INFO     MainThread: Waiting for metric value 
> 'statestore.live-backends'=4. Current value: 0
> -- 2019-05-21 20:10:10,568 INFO     MainThread: Sleeping 1s before next retry.
> -- 2019-05-21 20:10:11,570 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:11,571 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:11,572 INFO     MainThread: Waiting for metric value 
> 'statestore.live-backends'=4. Current value: 1
> -- 2019-05-21 20:10:11,572 INFO     MainThread: Sleeping 1s before next retry.
> -- 2019-05-21 20:10:12,573 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:12,574 INFO     MainThread: Starting new HTTP connection 
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:12,575 INFO     MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> -- getting state for operation: 
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> -- 2019-05-21 20:10:18,607 INFO     MainThread: Killing <StateStoreProcess 
> PID: 24937 
> (/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/statestored
>  -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored 
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
>  --statestore_update_frequency_ms=50 
> --statestore_priority_update_frequency_ms=50 
> --statestore_heartbeat_frequency_ms=50)> with signal 9
> -- 2019-05-21 20:10:18,641 INFO     MainThread: Killing <ImpaladProcess PID: 
> 23778 
> (/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/impalad
>  -kudu_client_rpc_timeout_ms 0 -kudu_master_hosts localhost 
> -mem_limit=12884901888 -logbufsecs=5 -v=1 -max_log_files=0 
> -log_filename=impalad_node1 
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
>  -beeswax_port=21001 -hs2_port=21051 -be_port=22001 -krpc_port=27001 
> -state_store_subscriber_port=23001 -webserver_port=25001 
> --statestore_subscriber_timeout_seconds=2 
> --failed_backends_query_cancellation_grace_period_ms=5000 
> --default_query_options=allow_erasure_coded_files=true)> with signal 9
> -- 2019-05-21 20:10:18,651 INFO     MainThread: Starting statestored with 
> arguments
> -- waiting for query to reach FINISHED state: 
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> Turning perftools heap leak checking off
> Redirecting stderr to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.ERROR
> -- closing connection to: 
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:21000
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to