[ 
https://issues.apache.org/jira/browse/IMPALA-7931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711843#comment-16711843
 ] 

Tim Armstrong commented on IMPALA-7931:
---------------------------------------

{noformat}
12:51:11 =================================== FAILURES 
===================================
12:51:11 __________________ TestShutdownCommand.test_shutdown_executor 
__________________
12:51:11 custom_cluster/test_restart_services.py:209: in test_shutdown_executor
12:51:11     assert self.__fetch_and_get_num_backends(QUERY, 
before_shutdown_handle) == 3
12:51:11 custom_cluster/test_restart_services.py:356: in 
__fetch_and_get_num_backends
12:51:11     self.client.QUERY_STATES['FINISHED'], timeout=20)
12:51:11 common/impala_service.py:267: in wait_for_query_state
12:51:11     target_state, query_state)
12:51:11 E   AssertionError: Did not reach query state in time target=4 actual=5
12:51:11 ---------------------------- Captured stderr setup 
-----------------------------
12:51:11 -- 2018-12-04 12:31:17,879 INFO     MainThread: Starting cluster with 
command: 
/data/jenkins/workspace/impala-cdh6.1.x-core-s3/repos/Impala/bin/start-impala-cluster.py
 --cluster_size=3 --num_coordinators=3 
--log_dir=/data/jenkins/workspace/impala-cdh6.1.x-core-s3/repos/Impala/logs/custom_cluster_tests
 --log_level=1 '--impalad_args="--shutdown_grace_period_s=5           
--shutdown_deadline_s=10           
--hostname=impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com" ' 
'--state_store_args="--statestore_update_frequency_ms=50     
--statestore_priority_update_frequency_ms=50     
--statestore_heartbeat_frequency_ms=50" ' 
--impalad_args=--default_query_options=
12:51:11 12:31:18 MainThread: Starting State Store logging to 
/data/jenkins/workspace/impala-cdh6.1.x-core-s3/repos/Impala/logs/custom_cluster_tests/statestored.INFO
12:51:11 12:31:19 MainThread: Starting Catalog Service logging to 
/data/jenkins/workspace/impala-cdh6.1.x-core-s3/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
12:51:11 12:31:20 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdh6.1.x-core-s3/repos/Impala/logs/custom_cluster_tests/impalad.INFO
12:51:11 12:31:21 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdh6.1.x-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
12:51:11 12:31:22 MainThread: Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdh6.1.x-core-s3/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
12:51:11 12:31:25 MainThread: Found 3 impalad/1 statestored/1 catalogd 
process(es)
12:51:11 12:31:25 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25000
12:51:11 12:31:25 MainThread: Waiting for num_known_live_backends=3. Current 
value: 0
12:51:11 12:31:26 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25000
12:51:11 12:31:26 MainThread: Waiting for num_known_live_backends=3. Current 
value: 0
12:51:11 12:31:27 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25000
12:51:11 12:31:27 MainThread: Waiting for num_known_live_backends=3. Current 
value: 2
12:51:11 12:31:28 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25000
12:51:11 12:31:28 MainThread: Waiting for num_known_live_backends=3. Current 
value: 2
12:51:11 12:31:29 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25000
12:51:11 12:31:29 MainThread: num_known_live_backends has reached value: 3
12:51:11 12:31:29 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25001
12:51:11 12:31:29 MainThread: num_known_live_backends has reached value: 3
12:51:11 12:31:30 MainThread: Getting num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25002
12:51:11 12:31:30 MainThread: num_known_live_backends has reached value: 3
12:51:11 12:31:30 MainThread: Impala Cluster Running with 3 nodes (3 
coordinators, 3 executors).
12:51:11 -- 2018-12-04 12:31:31,000 INFO     MainThread: Found 3 impalad/1 
statestored/1 catalogd process(es)
12:51:11 -- 2018-12-04 12:31:31,000 INFO     MainThread: Getting metric: 
statestore.live-backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25010
12:51:11 -- 2018-12-04 12:31:31,001 INFO     MainThread: Starting new HTTP 
connection (1): impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com
12:51:11 -- 2018-12-04 12:31:31,003 INFO     MainThread: Metric 
'statestore.live-backends' has reached desired value: 4
12:51:11 -- 2018-12-04 12:31:31,003 INFO     MainThread: Getting 
num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25000
12:51:11 -- 2018-12-04 12:31:31,004 INFO     MainThread: Starting new HTTP 
connection (1): impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com
12:51:11 -- 2018-12-04 12:31:31,005 INFO     MainThread: 
num_known_live_backends has reached value: 3
12:51:11 -- 2018-12-04 12:31:31,005 INFO     MainThread: Getting 
num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25001
12:51:11 -- 2018-12-04 12:31:31,005 INFO     MainThread: Starting new HTTP 
connection (1): impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com
12:51:11 -- 2018-12-04 12:31:31,006 INFO     MainThread: 
num_known_live_backends has reached value: 3
12:51:11 -- 2018-12-04 12:31:31,006 INFO     MainThread: Getting 
num_known_live_backends from 
impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com:25002
12:51:11 -- 2018-12-04 12:31:31,007 INFO     MainThread: Starting new HTTP 
connection (1): impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com
12:51:11 -- 2018-12-04 12:31:31,008 INFO     MainThread: 
num_known_live_backends has reached value: 3
12:51:11 -- connecting to: localhost:21000
12:51:11 -- connecting to localhost:21050 with impyla
12:51:11 Conn <impala.hiveserver2.HiveServer2Connection object at 0x6538b50>
12:51:11 -- 2018-12-04 12:31:31,152 INFO     MainThread: Closing active 
operation
12:51:11 ----------------------------- Captured stderr call 
-----------------------------
12:51:11 SET num_scanner_threads=1;
12:51:11 -- executing async: localhost:21000
12:51:11 select count(*) from functional_parquet.alltypes where sleep(1) = 
bool_col;
12:51:11 
12:51:11 -- 2018-12-04 12:31:33,382 INFO     MainThread: Started query 
a34c3a84775e5599:b2b25eb900000000
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- executing against localhost:21000
12:51:11 : shutdown('localhost:22001');
12:51:11 
12:51:11 -- 2018-12-04 12:31:34,391 INFO     MainThread: Started query 
8f4d995b47a934be:a5b5cad000000000
12:51:11 -- 2018-12-04 12:31:34,395 INFO     MainThread: Starting new HTTP 
connection (1): impala-ec2-centos74-m5-4xlarge-ondemand-0b8b.vpc.cloudera.com
12:51:11 -- 2018-12-04 12:31:34,397 INFO     MainThread: Getting 
num_known_live_backends from localhost:25000
12:51:11 -- 2018-12-04 12:31:34,398 INFO     MainThread: Starting new HTTP 
connection (1): localhost
12:51:11 -- 2018-12-04 12:31:34,399 INFO     MainThread: Waiting for 
num_known_live_backends=2. Current value: 3
12:51:11 -- 2018-12-04 12:31:35,400 INFO     MainThread: Getting 
num_known_live_backends from localhost:25000
12:51:11 -- 2018-12-04 12:31:35,401 INFO     MainThread: Starting new HTTP 
connection (1): localhost
12:51:11 -- 2018-12-04 12:31:35,402 INFO     MainThread: 
num_known_live_backends has reached value: 2
12:51:11 SET num_scanner_threads=1;
12:51:11 -- executing async: localhost:21000
12:51:11 select count(*) from functional_parquet.alltypes where sleep(1) = 
bool_col;
12:51:11 
12:51:11 -- 2018-12-04 12:31:35,415 INFO     MainThread: Started query 
cd4bd0b7518145c6:4bd1e0ee00000000
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x600ead0>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x600ead0>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
12:51:11 -- getting state for operation: 
<tests.common.impala_connection.OperationHandle object at 0x5803f10>
{noformat}

> test_shutdown_executor fails with timeout waiting for query target state
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-7931
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7931
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 3.2.0
>            Reporter: Lars Volker
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: broken-build
>
> On a recent S3 test run test_shutdown_executor hit a timeout waiting for a 
> query to reach state FINISHED. Instead the query stays at state 5 (EXCEPTION).
> {noformat}
> 12:51:11 __________________ TestShutdownCommand.test_shutdown_executor 
> __________________
> 12:51:11 custom_cluster/test_restart_services.py:209: in 
> test_shutdown_executor
> 12:51:11     assert self.__fetch_and_get_num_backends(QUERY, 
> before_shutdown_handle) == 3
> 12:51:11 custom_cluster/test_restart_services.py:356: in 
> __fetch_and_get_num_backends
> 12:51:11     self.client.QUERY_STATES['FINISHED'], timeout=20)
> 12:51:11 common/impala_service.py:267: in wait_for_query_state
> 12:51:11     target_state, query_state)
> 12:51:11 E   AssertionError: Did not reach query state in time target=4 
> actual=5
> {noformat}
> From the logs I can see that the query fails because one of the executors 
> becomes unreachable:
> {noformat}
> I1204 12:31:39.954125  5609 impala-server.cc:1792] Query 
> a34c3a84775e5599:b2b25eb900000000: Failed due to unreachable impalad(s): 
> jenkins-worker:22001
> {noformat}
> The query was {{select count\(*) from functional_parquet.alltypes where 
> sleep(1) = bool_col}}. 
> It seems that the query took longer than expected and was still running when 
> the executor shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to