[
https://issues.apache.org/jira/browse/IMPALA-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846248#comment-16846248
]
Bikramjeet Vig commented on IMPALA-8570:
----------------------------------------
Based on the logs it seems like the query is scheduled to run on only 2 out of
the 3 backends
{noformat}
20:10:07.493836 24877 coordinator.cc:355] a745b14e6a951582:ef6b8c9600000000]
starting execution on 2 backends for query_id=a745b14e6a951582:ef6b8c9600000000
{noformat}
and the impalad which is shutdown as a part of the test is the one which is not
running the query, so the query never gets cancelled. The test was written
assuming the query would always be scheduled on all 3 backends, but that
assumption does not hold true when the test is run on a non standard
minicluster config (which in this case is because the test ran on a minicluster
with erasure coding turned on). Best solution here is to make sure it only runs
on a regular minicluster.
> test_restart_statestore_query_resilience AssertionError: Query expected to
> fail
> -------------------------------------------------------------------------------
>
> Key: IMPALA-8570
> URL: https://issues.apache.org/jira/browse/IMPALA-8570
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 3.3.0
> Reporter: Thomas Tauber-Marshall
> Assignee: Bikramjeet Vig
> Priority: Blocker
>
> Seen in an erasure coding build
> {noformat}
> Error Message
> AssertionError: Query expected to fail assert False
> Stacktrace
> custom_cluster/test_restart_services.py:129: in
> test_restart_statestore_query_resilience
> assert False, "Query expected to fail"
> E AssertionError: Query expected to fail
> E assert False
> Standard Output
> Redirecting stdout to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> Redirecting stdout to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> Standard Error
> -- 2019-05-21 20:09:58,348 INFO MainThread: Starting cluster with
> command:
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/start-impala-cluster.py
> --cluster_size=3 --num_coordinators=3
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
> --log_level=1 '--impalad_args=--statestore_subscriber_timeout_seconds=2
> --failed_backends_query_cancellation_grace_period_ms=5000 '
> '--state_store_args=--statestore_update_frequency_ms=50
> --statestore_priority_update_frequency_ms=50
> --statestore_heartbeat_frequency_ms=50 '
> '--catalogd_args=--statestore_subscriber_timeout_seconds=2 '
> --impalad_args=--default_query_options=allow_erasure_coded_files=true
> 20:09:58 MainThread: Starting State Store logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 20:09:58 MainThread: Starting Catalog Service logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 20:09:58 MainThread: Starting Impala Daemon logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 20:09:58 MainThread: Starting Impala Daemon logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 20:09:58 MainThread: Starting Impala Daemon logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 20:10:01 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 20:10:01 MainThread: Getting num_known_live_backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25000
> 20:10:01 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 20:10:02 MainThread: Getting num_known_live_backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25000
> 20:10:02 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 20:10:03 MainThread: Getting num_known_live_backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25000
> 20:10:03 MainThread: num_known_live_backends has reached value: 3
> 20:10:04 MainThread: Getting num_known_live_backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25001
> 20:10:04 MainThread: num_known_live_backends has reached value: 3
> 20:10:05 MainThread: Getting num_known_live_backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25002
> 20:10:05 MainThread: num_known_live_backends has reached value: 3
> 20:10:05 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3
> executors).
> -- 2019-05-21 20:10:05,585 INFO MainThread: Found 3 impalad/1
> statestored/1 catalogd process(es)
> -- 2019-05-21 20:10:05,585 INFO MainThread: Getting metric:
> statestore.live-backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:05,587 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:05,588 INFO MainThread: Metric
> 'statestore.live-backends' has reached desired value: 4
> -- 2019-05-21 20:10:05,588 INFO MainThread: Getting
> num_known_live_backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25000
> -- 2019-05-21 20:10:05,589 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:05,590 INFO MainThread: num_known_live_backends has
> reached value: 3
> -- 2019-05-21 20:10:05,590 INFO MainThread: Getting
> num_known_live_backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25001
> -- 2019-05-21 20:10:05,591 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:05,592 INFO MainThread: num_known_live_backends has
> reached value: 3
> -- 2019-05-21 20:10:05,592 INFO MainThread: Getting
> num_known_live_backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25002
> -- 2019-05-21 20:10:05,592 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:05,593 INFO MainThread: num_known_live_backends has
> reached value: 3
> SET
> client_identifier=custom_cluster/test_restart_services.py::TestRestart::()::test_restart_statestore_query_resilience;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2019-05-21 20:10:05,749 INFO MainThread: Closing active operation
> SET
> client_identifier=custom_cluster/test_restart_services.py::TestRestart::()::test_restart_statestore_query_resilience;
> -- connecting to:
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:21000
> -- executing async:
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:21000
> select distinct * from tpch_parquet.lineitem where l_orderkey > sleep(1000);
> -- 2019-05-21 20:10:07,491 INFO MainThread: Started query
> a745b14e6a951582:ef6b8c9600000000
> -- getting state for operation:
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> -- getting state for operation:
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> -- getting state for operation:
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> -- 2019-05-21 20:10:08,518 INFO MainThread: Killing <StateStoreProcess
> PID: 23688
> (/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/statestored
> -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
> --statestore_update_frequency_ms=50
> --statestore_priority_update_frequency_ms=50
> --statestore_heartbeat_frequency_ms=50)> with signal 9
> -- 2019-05-21 20:10:08,541 INFO MainThread: No PID found for process
> cmdline:
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/statestored
> -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
> --statestore_update_frequency_ms=50
> --statestore_priority_update_frequency_ms=50
> --statestore_heartbeat_frequency_ms=50. Process is dead?
> -- 2019-05-21 20:10:08,541 INFO MainThread: Waiting for exit:
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/statestored
> -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
> --statestore_update_frequency_ms=50
> --statestore_priority_update_frequency_ms=50
> --statestore_heartbeat_frequency_ms=50 (PID: None)
> -- 2019-05-21 20:10:08,552 INFO MainThread: Starting statestored with
> arguments
> -- 2019-05-21 20:10:08,560 INFO MainThread: Getting metric:
> statestore.live-backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:08,561 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:08,561 INFO MainThread: Debug webpage not yet
> available: ('Connection aborted.', error(111, 'Connection refused'))
> Turning perftools heap leak checking off
> Redirecting stderr to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.ERROR
> -- 2019-05-21 20:10:09,564 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:09,565 INFO MainThread: Waiting for metric value
> 'statestore.live-backends'=4. Current value: 0
> -- 2019-05-21 20:10:09,565 INFO MainThread: Sleeping 1s before next retry.
> -- 2019-05-21 20:10:10,566 INFO MainThread: Getting metric:
> statestore.live-backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:10,567 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:10,568 INFO MainThread: Waiting for metric value
> 'statestore.live-backends'=4. Current value: 0
> -- 2019-05-21 20:10:10,568 INFO MainThread: Sleeping 1s before next retry.
> -- 2019-05-21 20:10:11,570 INFO MainThread: Getting metric:
> statestore.live-backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:11,571 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:11,572 INFO MainThread: Waiting for metric value
> 'statestore.live-backends'=4. Current value: 1
> -- 2019-05-21 20:10:11,572 INFO MainThread: Sleeping 1s before next retry.
> -- 2019-05-21 20:10:12,573 INFO MainThread: Getting metric:
> statestore.live-backends from
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:25010
> -- 2019-05-21 20:10:12,574 INFO MainThread: Starting new HTTP connection
> (1): impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com
> -- 2019-05-21 20:10:12,575 INFO MainThread: Metric
> 'statestore.live-backends' has reached desired value: 4
> -- getting state for operation:
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> -- 2019-05-21 20:10:18,607 INFO MainThread: Killing <StateStoreProcess
> PID: 24937
> (/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/statestored
> -logbufsecs=5 -v=1 -max_log_files=0 -log_filename=statestored
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
> --statestore_update_frequency_ms=50
> --statestore_priority_update_frequency_ms=50
> --statestore_heartbeat_frequency_ms=50)> with signal 9
> -- 2019-05-21 20:10:18,641 INFO MainThread: Killing <ImpaladProcess PID:
> 23778
> (/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/be/build/latest/service/impalad
> -kudu_client_rpc_timeout_ms 0 -kudu_master_hosts localhost
> -mem_limit=12884901888 -logbufsecs=5 -v=1 -max_log_files=0
> -log_filename=impalad_node1
> -log_dir=/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests
> -beeswax_port=21001 -hs2_port=21051 -be_port=22001 -krpc_port=27001
> -state_store_subscriber_port=23001 -webserver_port=25001
> --statestore_subscriber_timeout_seconds=2
> --failed_backends_query_cancellation_grace_period_ms=5000
> --default_query_options=allow_erasure_coded_files=true)> with signal 9
> -- 2019-05-21 20:10:18,651 INFO MainThread: Starting statestored with
> arguments
> -- waiting for query to reach FINISHED state:
> <tests.common.impala_connection.OperationHandle object at 0x5001ed0>
> Turning perftools heap leak checking off
> Redirecting stderr to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/custom_cluster_tests/statestored.ERROR
> -- closing connection to:
> impala-ec2-centos74-m5-4xlarge-ondemand-1173.vpc.cloudera.com:21000
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]