[
https://issues.apache.org/jira/browse/IMPALA-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Laszlo Gaal updated IMPALA-13167:
---------------------------------
Summary: Impala's coordinator could not be connected after a restart in
custom cluster test in the ASAN build (was: Impala's coordinator could not be
connected after a restart in custom cluster test in the ASAN build on ARM)
> Impala's coordinator could not be connected after a restart in custom cluster
> test in the ASAN build
> ----------------------------------------------------------------------------------------------------
>
> Key: IMPALA-13167
> URL: https://issues.apache.org/jira/browse/IMPALA-13167
> Project: IMPALA
> Issue Type: Bug
> Reporter: Fang-Yu Rao
> Assignee: Jason Fehr
> Priority: Minor
> Labels: broken-build
> Fix For: Impala 4.5.0
>
>
> In an internal Jenkins run, we found that it's possible that Impala's
> coordinator could not be connected after a restart that occurred after the
> coordinator hit a DCHECK during the custom cluster test in the ASAN build on
> ARM.
> Specifically, in that Jenkins run, we found that Impala's coordinator hit the
> DCHECK in [RuntimeProfile::EventSequence::Start(int64_t
> start_time_ns)|https://github.com/apache/impala/blob/master/be/src/util/runtime-profile-counters.h#L656]
> while running a query in
> [ranger_column_masking_complex_types.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test#L724-L732]
> that was run by
> [test_column_masking()|https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py#L1916].
> This is a known issue as described in IMPALA-4631.
> Since Impala daemons and the catalog server are restarted for each test in
> test_ranger.py, the next test run after test_column_masking() should most
> likely be passed. However it did not seem like this. We found that for the
> following few tests (e.g., test_block_metadata_update()) in test_ranger.py,
> Impala's pytest framework was not able to connect to the coordinator with the
> following error and hence those tests failed.
> {code:java}
> -- 2024-06-18 08:49:43,350 INFO MainThread: Starting cluster with
> command:
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/bin/start-impala-cluster.py
> '--state_store_args=--statestore_update_frequency_ms=50
> --statestore_priority_update_frequency_ms=50
> --statestore_heartbeat_frequency_ms=50' --cluster_size=3 --num_coordinators=3
> --log_dir=/data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests
> --log_level=1 '--impalad_args=--server-name=server1
> --ranger_service_type=hive --ranger_app_id=impala
> --authorization_provider=ranger ' '--state_store_args=None '
> '--catalogd_args=--server-name=server1 --ranger_service_type=hive
> --ranger_app_id=impala --authorization_provider=ranger '
> --impalad_args=--default_query_options=
> 08:49:43 MainThread: Found 0 impalad/0 statestored/0 catalogd process(es)
> 08:49:43 MainThread: Starting State Store logging to
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/statestored.INFO
> 08:49:43 MainThread: Starting Catalog Service logging to
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
> 08:49:44 MainThread: Starting Impala Daemon logging to
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad.INFO
> 08:49:44 MainThread: Starting Impala Daemon logging to
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
> 08:49:44 MainThread: Starting Impala Daemon logging to
> /data/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
> 08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:47 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:47 MainThread: Getting num_known_live_backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
> 08:49:47 MainThread: Debug webpage not yet available:
> HTTPConnectionPool(host='impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com',
> port=25000): Max retries exceeded with url: /backends?json (Caused by
> NewConnectionError('<urllib3.connection.HTTPConnection object at
> 0xffff8d176750>: Failed to establish a new connection: [Errno 111] Connection
> refused',))
> 08:49:49 MainThread: Debug webpage did not become available in expected time.
> 08:49:49 MainThread: Waiting for num_known_live_backends=3. Current value:
> None
> 08:49:50 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:50 MainThread: Getting num_known_live_backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
> 08:49:50 MainThread: Waiting for num_known_live_backends=3. Current value: 0
> 08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:51 MainThread: Getting num_known_live_backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
> 08:49:51 MainThread: num_known_live_backends has reached value: 3
> 08:49:51 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:51 MainThread: Getting num_known_live_backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25001
> 08:49:51 MainThread: num_known_live_backends has reached value: 3
> 08:49:52 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
> 08:49:52 MainThread: Getting num_known_live_backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25002
> 08:49:52 MainThread: num_known_live_backends has reached value: 3
> 08:49:52 MainThread: Impala Cluster Running with 3 nodes (3 coordinators, 3
> executors).
> -- 2024-06-18 08:49:52,811 DEBUG MainThread: Found 3 impalad/1
> statestored/1 catalogd process(es)
> -- 2024-06-18 08:49:52,811 INFO MainThread: Getting metric:
> statestore.live-backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25010
> -- 2024-06-18 08:49:52,814 INFO MainThread: Metric
> 'statestore.live-backends' has reached desired value: 4
> -- 2024-06-18 08:49:52,814 DEBUG MainThread: Getting
> num_known_live_backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25000
> -- 2024-06-18 08:49:52,816 INFO MainThread: num_known_live_backends has
> reached value: 3
> -- 2024-06-18 08:49:52,816 DEBUG MainThread: Getting
> num_known_live_backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25001
> -- 2024-06-18 08:49:52,818 INFO MainThread: num_known_live_backends has
> reached value: 3
> -- 2024-06-18 08:49:52,818 DEBUG MainThread: Getting
> num_known_live_backends from
> impala-ec2-rhel88-m7g-4xlarge-ondemand-1d18.vpc.cloudera.com:25002
> -- 2024-06-18 08:49:52,820 INFO MainThread: num_known_live_backends has
> reached value: 3
> SET
> client_identifier=authorization/test_ranger.py::TestRanger::()::test_block_metadata_update[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresh;
> -- connecting to: localhost:21000
> -- 2024-06-18 08:49:52,821 INFO MainThread: Could not connect to ('::1',
> 21000, 0, 0)
> Traceback (most recent call last):
> File
> "/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
> line 137, in open
> handle.connect(sockaddr)
> File
> "/data/jenkins/workspace/impala-asf-master-core-asan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
> line 228, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> -- connecting to localhost:21050 with impyla
> -- 2024-06-18 08:49:52,821 INFO MainThread: Could not connect to ('::1',
> 21050, 0, 0)
> Traceback (most recent call last):
> File
> "/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
> line 137, in open
> handle.connect(sockaddr)
> File
> "/data/jenkins/workspace/impala-asf-master-core-asan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
> line 228, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> -- 2024-06-18 08:49:53,036 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2024-06-18 08:49:53,058 INFO MainThread: Closing active operation
> SET
> client_identifier=authorization/test_ranger.py::TestRanger::()::test_block_metadata_update[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresh;
> -- connecting to: localhost:21000
> -- 2024-06-18 08:49:53,061 INFO MainThread: Could not connect to ('::1',
> 21000, 0, 0)
> Traceback (most recent call last):
> File
> "/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
> line 137, in open
> handle.connect(sockaddr)
> File
> "/data/jenkins/workspace/impala-asf-master-core-asan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
> line 228, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> SET
> client_identifier=authorization/test_ranger.py::TestRanger::()::test_block_metadata_update[protocol:beeswax|exec_option:{'test_replan':1;'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_thresh;
> -- connecting to: localhost:21000
> -- 2024-06-18 08:49:53,062 INFO MainThread: Could not connect to ('::1',
> 21000, 0, 0)
> Traceback (most recent call last):
> File
> "/data0/jenkins/workspace/impala-asf-master-core-asan-arm/repos/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py",
> line 137, in open
> handle.connect(sockaddr)
> File
> "/data/jenkins/workspace/impala-asf-master-core-asan-arm/Impala-Toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/socket.py",
> line 228, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 111] Connection refused
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]