Attila Doroszlai created HDDS-13815:
---------------------------------------

             Summary: Test for repairing ratis transaction is flaky
                 Key: HDDS-13815
                 URL: https://issues.apache.org/jira/browse/HDDS-13815
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: test
            Reporter: Attila Doroszlai


{code:title=https://github.com/apache/ozone/actions/runs/18643469218/job/53146856420#step:13:534}
Command timed out or failed => OMs are not running as expected. Test for 
repairing ratis transaction failed.
ERROR: Test execution of ozonesecure-ha/test-repair-tools.sh is FAILED!!!!
{code}

The test should wait for OM leader election after restarting all OMs, before 
checking volume list:

{code:title=https://github.com/apache/ozone/blob/aee5aa31a733fa573e2793f905dd7666194e09e5/hadoop-ozone/dist/src/main/compose/ozonesecure-ha/test-repair-tools.sh#L82-L88}
repair_and_restart_om "ozonesecure-ha-om1-1" "om1"
repair_and_restart_om "ozonesecure-ha-om2-1" "om2"
repair_and_restart_om "ozonesecure-ha-om3-1" "om3"
if ! execute_command_in_container scm1.org timeout 15s ozone sh volume list 
1>/dev/null; then
  echo "Command timed out or failed => OMs are not running as expected. Test 
for repairing ratis transaction failed."
  exit 1
fi
{code}

First failure on {{master}}: 2025/10/19
https://github.com/apache/ozone/actions/runs/18624466139/job/53100660151

Test has not been changed since 2025/07/30.  Failure may be triggered a by 
recent [GitHub runner 
update|https://github.com/actions/runner-images/releases/tag/ubuntu24%2F20251014.76],
 which upgraded the Linux kernel, but that's just a guess.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to