Thanks a lot for the patches Ahzo. Especially fixing the file handle leak should help a lot.
I guess it's too late for bullseye now, but I can at least upload a fixed package to experimental. I'll also try to fix many of the failing tests by including sage's (large) patch to support pari 2.13 which was finished in June [1]. I have to see if I can backport that to sage 9.2 or if I update to sage 9.4 right away. Best, Tobias [1] https://trac.sagemath.org/ticket/30801 On 7/31/21 8:47 PM, Ahzo wrote: > Control: tags -1 patch > > Hi, > > the main problem making the sagemath testsuite flaky is that it randomly > aborts due to 'Too many open files'. > Thus only a small part of the test suite gets actually run, when the build is > heavily parallelized. > This can be seen by reporting not only the number of failed, but also that of > run tests, which shows significant fluctuations. > > The problem occurs, because every finished, but not yet logged worker, holds > an open fd (a pipe used to read the output of the child actually doing the > tests). > Thus when following a long running worker, i.e. logging its messages, while > it is still running, so many finished tests can accumulate, that the open > files limit (ulimit -n) is reached. > > However, there should be no open pipe per finished worker, as the test suite > calls 'os.close(self.rmessages)' before waiting for logging the messages. > So this seems to be caused by something that python does behind the scenes. > Removing the single line 'finished.append(w)' in src/sage/doctest/forker.py > prevents the open fd increase, though at the cost of hardly logging any test > output. > > This problem can be avoided by simply logging every finished test, but no > running one. > > With only the 0001-Report-the-number-of-total-tests-run.patch, the result is > something like: > Success: 5 of 71435 tests failed, up to 200 failures are tolerated > > Adding the dt-Do-not-follow-a-running-worker.patch, the result becomes: > Success: 194 of 361139 tests failed, up to 200 failures are tolerated > > These 194 failures are pretty close to the threshold of 200, so it is not > particularly surprising, that this can fail in some environments. > Slightly passing this threshold triggered the build failure in this bug and > also the one in bug #983931. > > Increasing the threshold to 300 should make that rather unlikely, though. > And considering that there are more than 360 thousand tests, less then 300 > failures means more than 99.9 % of the tests succeeded. > > The "cython: not found" issue is trivial to fix and important, because > otherwise 'sage --cython' does not work and there is no '--cython3' option > (unlike e.g. the '--ipython3' option). > > After adding the 0002-Tolerate-up-to-300-failing-tests.patch and the > u2-Adapt-to-python2-removal.patch the test result is: > Success: 189 of 361139 tests failed, up to 300 failures are tolerated > > It would also be a good idea to include a backport of commit 5cf493ca51 > ("Avoid libgmp's new lazy allocation") in the next sagemath upload, as that > fixes a severe memory leak (see bug #964848). > > As to the crashes, I can't reproduce any crash when testing > interfaces/singular.py: > sage -t --long --random-seed=0 src/sage/interfaces/singular.py > [404 tests, 3.87 s] > > This crash also does not always happen for the reproducible builds either, > e.g. the following log shows it first crashing and then passing this test: > https://tests.reproducible-builds.org/debian/rbuild/bullseye/amd64/sagemath_9.2-2.rbuild.log.gz > [...] > sage -t --long --random-seed=0 src/sage/interfaces/singular.py > Killed due to segmentation fault > [...] > sage -t --long --random-seed=0 src/sage/interfaces/singular.py > [404 tests, 21.06 s] > [...] > > However, a number of other crashes happen during every test run, but only one > of them causes a test failure: > sage -t --long --random-seed=0 src/sage/interfaces/tests.py > ********************************************************************** > File "src/sage/interfaces/tests.py", line 34, in sage.interfaces.tests > Failed example: > subprocess.call("echo syntax error | ecl", **kwds) in (0, 255) > Expected: > True > Got: > False > ********************************************************************** > > Similar crashes sometimes also occur when testing interfaces/lisp.py, but > without causing the test to fail. > This is a problem in ecl, which crashes when both stdout and stderr are full, > see bug #710953. > > Then there is a crash in nauty-gentourng triggered by > src/sage/graphs/digraph_generators.py. > For details see bug #991750. > > There are also two SIGABRT crashes in mwrank triggered by > src/sage/interfaces/mwrank.py. > These seem to be intentional due to invalid input. > > Finally, there are some python crashes (5 SIGQUIT, 1 SIGABRT, 1 SIGSEGV) that > are all caused intentionally by the test suite. > > So none of these crashes are problems in sagemath itself. > > Regards, > Ahzo