I would like to report that I have managed to fix the testing warts that people were experiencing. For comprehensive list on what was done, you can filter by the make-tests-easier-to-run label i.e. https://github.com/apache/incubator-pekko/pulls?q=is%3Apr+label%3Amake-tests-easier-to-run+ but in summary one test didn't clean up its state properly (i.e. it would pass on first run but continuously fail on consecutive runs which is why it wasn't picked up in CI) and a new sbt command called testQuickUntilPassed was added (and documented both in README.md and the sbt intro prompt) which continuously runs tests, retrying only the ones that failed until all happen to pass which handles flaky tests.
There may still be more to fix (diagnosing this takes a lot of time) but for the purposes of validating releases it should be good enough. If you do experience further problems please message the list/create a github ticket. On Thu, Jul 6, 2023 at 11:55 AM PJ Fanning <fannin...@gmail.com> wrote: > Apologies, I forgot to add the link for [1] in my previous email. > > [1] https://github.com/apache/incubator-pekko/actions > > On Thu, 6 Jul 2023 at 10:53, PJ Fanning <fannin...@gmail.com> wrote: > > > > We can certainly prioritise work on making the tests easier to work > > with. It would be good from a community building perspective. > > > > We have the tests working in the GitHub Actions CI framework [1]. > > Pekko is a complex clustering tool and it does require quite some > > setup to run some of the tests. > > > > Unfortunately, blocking a 1.0.0 release would damage our hopes of > > building the community. > > > > > > On Thu, 6 Jul 2023 at 10:39, Matthew de Detrich > > <matthew.dedetr...@aiven.io.invalid> wrote: > > > > > > > Efforts have been made in the past to clean things up, but the > reality > > > is that it is very hard to make tests reliable in an asynchronous > > > framework and in general it is almost impossible to accommodate for > > > all possible running environments. > > > > > > I really want to highlight this takeaway, if we were so strict with > other > > > comparable > > > Apache TLP (i.e. Kafka, Spark, Cassandra, Flink etc etc) then no > release be > > > ever > > > made. While there is merit in discussing how bespoke the testing for > Pekko > > > is > > > vs other "typical" ASF projects, if the expectation is that you can > just run > > > sbt test on a local laptop and the tests to reliably pass then thats > not > > > going to > > > happen any time soon. > > > > > > As we speak I am adding in documentation in various places (i.e. > > > https://cwiki.apache.org/confluence/display/PEKKO/Testing > > > and https://github.com/apache/incubator-pekko/pull/469) for > techniques on > > > how to > > > handle this (i.e. testQuick) and I will also document what Johannes > said > > > right now regarding the timing factor/timing tests specifically for > Pekko > > > core. > > > > > > On Thu, Jul 6, 2023 at 11:28 AM Johannes Rudolph < > johannes.rudo...@gmail.com> > > > wrote: > > > > > > > The main test suite that is run nightly is run with this command: > > > > > > > > sbt \ > > > > -Dpekko.cluster.assert=on \ > > > > -Dpekko.log.timestamps=true \ > > > > -Dpekko.test.timefactor=2 \ > > > > -Dpekko.actor.testkit.typed.timefactor=2 \ > > > > -Dpekko.test.tags.exclude=gh-exclude,timing \ > > > > -Dpekko.test.multi-in-test=false \ > > > > clean "+~ ${{ matrix.scalaVersion }} test" > checkTestsHaveRun > > > > > > > > > > > > > https://github.com/apache/incubator-pekko/blob/88bf6329f193eedd45091f4f9a515943bd8ecb23/.github/workflows/nightly-builds.yml#L168-L175 > > > > > > > > Unfortunately, the amount of flaky tests is high, so the important > bits are > > > > > > > > -Dpekko.test.timefactor=2 > > > > -Dpekko.test.tags.exclude=gh-exclude,timing > > > > > > > > which makes timing in tests more lenient and also excludes some > notorious > > > > ones. > > > > > > > > Efforts have been made in the past to clean things up, but the > reality > > > > is that it is very hard to make tests reliable in an asynchronous > > > > framework and in general it is almost impossible to accommodate for > > > > all possible running environments. > > > > > > > > Johannes > > > > > > > > On Thu, Jul 6, 2023 at 10:30 AM Matthew de Detrich > > > > <matthew.dedetr...@aiven.io.invalid> wrote: > > > > > > > > > > So in general testing software like Pekko is going to be > problematic due > > > > to > > > > > it being a distributed/concurrent system i.e. there are > determinism (i.e. > > > > > flaky) test issues. One thing that I did however notice is that in > the > > > > > github actions CI we pass arguments to help alleviate these issues > (i.e. > > > > > > > > > > https://github.com/apache/incubator-pekko/blob/main/.github/workflows/nightly-builds.yml#L35-L42 > > > > ). > > > > > The way that ASF release process works where it compels committers > to run > > > > > tests locally has surfaced this, where as in the past the source > of truth > > > > > for tests was either in github actions CI or a in the case of > Lightbend > > > > > private machines/scripts that were specifically setup to test the > > > > software > > > > > before a release. > > > > > > > > > > A final thing to note is that when someone makes a PR against > Pekko, > > > > tests > > > > > are only run on the module that has changed (this is achieved via > > > > > https://github.com/sbt/sbt-pull-request-validator) and most of the > > > > > flakiness occurs when you try to run all of the tests at once. For > this, > > > > > having a powerful machine helps. > > > > > > > > > > On Thu, Jul 6, 2023 at 10:20 AM Claude Warren, Jr > > > > > <claude.war...@aiven.io.invalid> wrote: > > > > > > > > > > > My opinion is that if I check out the release code the tests > should > > > > pass, > > > > > > or there should be a list of "flaky" tests that are known to have > > > > problems > > > > > > so I can at least verify that the failures are in them. > > > > > > > > > > > > On Thu, Jul 6, 2023 at 10:15 AM PJ Fanning <fannin...@apache.org > > > > > > wrote: > > > > > > > > > > > > > There are multiple modules. The tests for some modules are > passing > > > > but > > > > > > > for other modules they are failing. > > > > > > > > > > > > > > With one example: > > > > > > > > > > > > > > [error] (remote-tests / Test / test) sbt.TestsFailedException: > Tests > > > > > > > unsuccessful > > > > > > > > > > > > > > remote-tests module is in the directory of the same name. > > > > > > > > > > > > > > You can use this command just to run the tests in that module > > > > > > > > > > > > > > sbt remote-tests/test > > > > > > > > > > > > > > Some of the tests can be sensitive to the performance of your > > > > machine. > > > > > > > > > > > > > > If you continue to have trouble, maybe you could send me your > full > > > > > > > output. I don't think this public mailing list would be a good > place > > > > > > > for that large output but you can email me directly or message > it > > > > > > > using Slack. > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Jul 2023 at 09:06, Claude Warren, Jr > > > > > > > <claude.war...@aiven.io.invalid> wrote: > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > While testing RC3 I did the following: > > > > > > > > > > > > > > > > sbt test > > > > > > > > > > > > > > > > the result I got was: > > > > > > > > [info] Total number of tests run: 628 > > > > > > > > [info] Suites: completed 181, aborted 0 > > > > > > > > [info] Tests: succeeded 628, failed 0, canceled 0, ignored 6, > > > > pending 2 > > > > > > > > [info] All tests passed. > > > > > > > > [error] (remote-tests / Test / test) > sbt.TestsFailedException: > > > > Tests > > > > > > > > unsuccessful > > > > > > > > [error] (persistence / Test / test) > sbt.TestsFailedException: Tests > > > > > > > > unsuccessful > > > > > > > > [error] (persistence-shared / Test / test) > > > > sbt.TestsFailedException: > > > > > > > Tests > > > > > > > > unsuccessful > > > > > > > > [error] (remote / Test / test) sbt.TestsFailedException: > Tests > > > > > > > unsuccessful > > > > > > > > [error] (stream-tests / Test / test) > sbt.TestsFailedException: > > > > Tests > > > > > > > > unsuccessful > > > > > > > > [error] Total time: 7313 s (02:01:53), completed 5 Jul 2023, > > > > 19:10:06 > > > > > > > > > > > > > > > > Why the success and yet the failures? > > > > > > > > > > > > > > > > Claude > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > > To unsubscribe, e-mail: dev-unsubscr...@pekko.apache.org > > > > > > > For additional commands, e-mail: dev-h...@pekko.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Matthew de Detrich > > > > > > > > > > *Aiven Deutschland GmbH* > > > > > > > > > > Immanuelkirchstraße 26, 10405 Berlin > > > > > > > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > > > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > > > > > > > > > > *m:* +491603708037 > > > > > > > > > > *w:* aiven.io *e:* matthew.dedetr...@aiven.io > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: dev-unsubscr...@pekko.apache.org > > > > For additional commands, e-mail: dev-h...@pekko.apache.org > > > > > > > > > > > > > > -- > > > > > > Matthew de Detrich > > > > > > *Aiven Deutschland GmbH* > > > > > > Immanuelkirchstraße 26, 10405 Berlin > > > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen > > > > > > *m:* +491603708037 > > > > > > *w:* aiven.io *e:* matthew.dedetr...@aiven.io > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@pekko.apache.org > For additional commands, e-mail: dev-h...@pekko.apache.org > > -- Matthew de Detrich *Aiven Deutschland GmbH* Immanuelkirchstraße 26, 10405 Berlin Amtsgericht Charlottenburg, HRB 209739 B Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen *m:* +491603708037 *w:* aiven.io *e:* matthew.dedetr...@aiven.io