Hi systemds developers,
Recently I've had some discussions that would be useful to all systemds developers regarding flaky tests. Some federated tests fail when executed locally, while the GitHub actions do not reflect the same bug. Usually the test that fail locally also fails in GitHub actions if the test consistently fail, but I have added a retry for the actions to rerun failed tests up to 3 times. This in practice means that the test have to fail 3 times before we do not get a green mark on GitHub. If you have tests that fail locally you can try to increase the rerun count like: ` mvn clean compile test -Dmaven.test.skip=false -Drerun.failingtests.count=1 -Dtest=org.apache.sysds.test.functions.federated.primitives.FederatedFullAggregateTest ` `-Drerun.failingtests.count=1` mean it will repeat any test if they fail once. As an example of flaky tests execution from the latest commit you can see some tests fail while their reruns parse in the following log: https://github.com/apache/systemds/runs/5604072007?check_suite_focus=true I will see if we can make the GitHub actions change the result to mark unstable tests for the future, and we have to address these bugs/stability issues for multi tenant federated workers. best regards Sebastian
