[ 
https://issues.apache.org/jira/browse/IMPALA-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873328#comment-17873328
 ] 

ASF subversion and git services commented on IMPALA-13167:
----------------------------------------------------------

Commit 0c2af2de55a38b2f93d22ac49f6e202c6bcea685 in impala's branch 
refs/heads/branch-4.4.1 from jasonmfehr
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0c2af2de5 ]

IMPALA-13167: Fix Workload Management Tests Timing out

The workload management tests in test_query_log.py have been timing out
when they wait for workload management to fully initialize the
sys.impala_query_log and sys.impala_query_live tables. These tests do
not find the log message stating that the sys.impala_query_log table
has been created. These tests use the assert_impalad_log_contains
function from impala_test_suite.py to search for the relevant log
message. By default, this function only allows 6 seconds for this
message to appear. In bigger clusters that have larger amounts of data
to sync from the statestore and catalog, this time is not long enough.

This patch modifies the timeout from 6 seconds to 1 minute that the
tests will wait before they time out. The longer timeout will give more
time for the cluster to completed start and workload management to
initialize before it fails the test.

Change-Id: I7ca8c7543360b5cb183cfb0b0b515d38c17e0974
Reviewed-on: http://gerrit.cloudera.org:8080/21549
Reviewed-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Andrew Sherman <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Impala's coordinator could not be connected after a restart in custom cluster 
> test in the ASAN build
> ----------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13167
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13167
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Fang-Yu Rao
>            Assignee: Jason Fehr
>            Priority: Minor
>              Labels: broken-build
>             Fix For: Impala 4.5.0
>
>
> In an internal Jenkins run, we found that it's possible that Impala's 
> coordinator could not be connected after a restart that occurred after the 
> coordinator hit a DCHECK during the custom cluster test in the ASAN build on 
> ARM.
> Specifically, in that Jenkins run, we found that Impala's coordinator hit the 
> DCHECK in [RuntimeProfile::EventSequence::Start(int64_t 
> start_time_ns)|https://github.com/apache/impala/blob/master/be/src/util/runtime-profile-counters.h#L656]
>  while running a query in 
> [ranger_column_masking_complex_types.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test#L724-L732]
>  that was run by 
> [test_column_masking()|https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py#L1916].
>  This is a known issue as described in IMPALA-4631.
> Since Impala daemons and the catalog server are restarted for each test in 
> test_ranger.py, the next test run after test_column_masking() should most 
> likely be passed. However it did not seem like this. We found that for the 
> following few tests (e.g., test_block_metadata_update()) in test_ranger.py, 
> Impala's pytest framework was not able to connect to the coordinator with the 
> following error and hence those tests failed.
> {code:java}
> -- 2024-06-18 08:49:43,350 INFO     MainThread: Starting cluster with 
> command: 
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/bin/start-impala-cluster.py
>  '--state_store_args=--statestore_update_frequency_ms=50     
> --statestore_priority_update_frequency_ms=50     
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3 
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests
>  --log_level=1 '--impalad_args=--server-name=server1 
> --ranger_service_type=hive --ranger_app_id=impala 
> --authorization_provider=ranger ' '--state_store_args=None ' 
> '--catalogd_args=--server-name=server1 --ranger_service_type=hive 
> --ranger_app_id=impala --authorization_provider=ranger ' 
> --impalad_args=--default_query_options=
> 08:49:43 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 08:49:43 MainThread: Starting State Store logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 08:49:43 MainThread: Starting Catalog Service logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 08:49:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 08:49:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 08:49:44 MainThread: Starting Impala Daemon logging to 
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:47 MainThread: Getting num_known_live_backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
> 08:49:47 MainThread: Debug webpage not yet available: 
> HTTPConnectionPool(host='impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com',
>  port=25000): Max retries exceeded with url: /backends?json (Caused by 
> NewConnectionError('<urllib3.connection.HTTPConnection object at 
> 0xffff8d176750>: Failed to establish a new connection: [Errno 111] Connection 
> refused',))
> 08:49:49 MainThread: Debug webpage did not become available in expected time.
> 08:49:49 MainThread: Waiting for num_known_live_backends=3. Current value: 
> None
> 08:49:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:50 MainThread: Getting num_known_live_backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
> 08:49:50 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:51 MainThread: Getting num_known_live_backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
> 08:49:51 MainThread: num_known_live_backends has reached value: 3
> 08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:51 MainThread: Getting num_known_live_backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25001
> 08:49:51 MainThread: num_known_live_backends has reached value: 3
> 08:49:52 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:52 MainThread: Getting num_known_live_backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25002
> 08:49:52 MainThread: num_known_live_backends has reached value: 3
> 08:49:52 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3 
> executors).
> -- 2024-06-18 08:49:52,811 DEBUG    MainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> -- 2024-06-18 08:49:52,811 INFO     MainThread: Getting metric: 
> statestore.live-backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25010
> -- 2024-06-18 08:49:52,814 INFO     MainThread: Metric 
> 'statestore.live-backends' has reached desired value: 4
> -- 2024-06-18 08:49:52,814 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
> -- 2024-06-18 08:49:52,816 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> -- 2024-06-18 08:49:52,816 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25001
> -- 2024-06-18 08:49:52,818 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> -- 2024-06-18 08:49:52,818 DEBUG    MainThread: Getting 
> num_known_live_backends from 
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25002
> -- 2024-06-18 08:49:52,820 INFO     MainThread: num_known_live_backends has 
> reached value: 3
> SET 
> client_identifier=authorization/test_ranger.py::TestRanger::()::test_block_metadata_update[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresh;
> -- connecting to: localhost:21000
> -- 2024-06-18 08:49:52,821 INFO     MainThread: Could not connect to ('::1', 
> 21000, 0, 0)
> Traceback (most recent call last):
>   File 
> "/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
>  line 137, in open
>     handle.connect(sockaddr)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-asan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
>  line 228, in meth
>     return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> -- connecting to localhost:21050 with impyla
> -- 2024-06-18 08:49:52,821 INFO     MainThread: Could not connect to ('::1', 
> 21050, 0, 0)
> Traceback (most recent call last):
>   File 
> "/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
>  line 137, in open
>     handle.connect(sockaddr)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-asan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
>  line 228, in meth
>     return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> -- 2024-06-18 08:49:53,036 INFO     MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2024-06-18 08:49:53,058 INFO     MainThread: Closing active operation
> SET 
> client_identifier=authorization/test_ranger.py::TestRanger::()::test_block_metadata_update[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresh;
> -- connecting to: localhost:21000
> -- 2024-06-18 08:49:53,061 INFO     MainThread: Could not connect to ('::1', 
> 21000, 0, 0)
> Traceback (most recent call last):
>   File 
> "/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
>  line 137, in open
>     handle.connect(sockaddr)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-asan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
>  line 228, in meth
>     return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> SET 
> client_identifier=authorization/test_ranger.py::TestRanger::()::test_block_metadata_update[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresh;
> -- connecting to: localhost:21000
> -- 2024-06-18 08:49:53,062 INFO     MainThread: Could not connect to ('::1', 
> 21000, 0, 0)
> Traceback (most recent call last):
>   File 
> "/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
>  line 137, in open
>     handle.connect(sockaddr)
>   File 
> "/data/jenkins/workspace/impala-asf-master-core-asan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
>  line 228, in meth
>     return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to