Wenzhe Zhou created IMPALA-12550:
------------------------------------

             Summary: test_statestored_auto_failover_with_disabling_network 
flaky
                 Key: IMPALA-12550
                 URL: https://issues.apache.org/jira/browse/IMPALA-12550
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 4.4.0
            Reporter: Wenzhe Zhou
            Assignee: Wenzhe Zhou


TestStatestoredHA.test_statestored_auto_failover_with_disabling_network failed 
with following stack trace when repeatedly run this test.
tests/custom_cluster/test_statestored_ha.py:645: in 
test_statestored_auto_failover_with_disabling_network
    "statestore.in-ha-recovery-mode", expected_value=False, timeout=120)
tests/common/impala_service.py:144: in wait_for_metric_value
    self.__metric_timeout_assert(metric_name, expected_value, timeout)
tests/common/impala_service.py:213: in __metric_timeout_assert
    assert 0, assert_string
E   AssertionError: Metric statestore.in-ha-recovery-mode did not reach value 
False in 120s.


>From log messages, the issue was caused by the delay of HA Handshake RPC 
>between two statestore instances. Sometimes the active statestore took a few 
>minutes to response the handshake requests from standby statestore.

This issue is different from IMPALA-12525, which was caused locking issue on 
subscribers side.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to