Han-Wen Nienhuys <[email protected]> writes: > On Sat, Feb 22, 2020 at 10:01 PM David Kastrup <[email protected]> wrote: >> >> That's oversimplifying the staging->master push process a bit which goes >> >> to considerable pain to make sure that its own copy of staging is still >> >> part of the upstream staging branch before pushing the tested result. >> >> That makes it possible to manually stop a bad staging commit from >> >> reaching master even if it would compile: once you reset staging, no >> >> already running Patchy process will override that decision with a >> >> version of staging that has become stale. >> > >> > If there are multiple patchy processes, you'd hope they all come to >> > the same conclusion. >> >> Patchy processes don't reset staging. Humans do. But our various >> patchies run on different platforms with different version libraries. >> That actually has turned out helpful in discovering portability problems >> at times. > > How many patchies are there, and on what platforms do they run?
At the current point of time, I think that my laptop is the main workhorse, Dan has another one he runs at times particularly when he has committed, well, a commit of his own into staging, James does the manual Patchy runs with visual inspection but runs the staging Patchy now at most on weekends and at home (he used to have a scheduled Patchy at work checking for a run every two hours, but a change in office policies stopped that practice). > Yes, it's an oversimplification that fits in an email so we can > discuss next steps. My question is: are there fundamental features > that you think are missing? In what we are doing now? Or in what you have proposed? Since the latter apparently is to include oversimplification, I don't see how I could answer that without actually seeing the full version. The current version has evolved to do a reasonable job within the framework of people running it on their personal setup. A considerable change of framework would obviously trigger the question again, and the per-patch Patchy operated by James only obviously has sunk to a state where fundamental features are not as much missing as having become inoperative. >> > 1) we get a reproducible test process, because everyone can use the >> > same base images. >> >> Which makes it less likely that we discover portability problems. I >> am not sure what problem you are trying to address here. > > It means that if CI sees an error (because it does testing on multiple > platforms), it is trivial for me to reproduce that error, and fix it > locally. The thing with "multiple platforms" is that our testing does not actually cover multiple platforms. The serious testing happens after installers are released. The most release-critical testing is GUB going through. The binaries and installers coming out of GUB never get to see a single regtest except possibly manually. I see absolutely no chance that we can change that significantly without leaving both the free and the affordable tiers of CI services. > By contrast, today if there is an error (see the Pango problem), we > have to email back and forth to figure out what is going on. Yes. But if every developer tests on the same platform, we will have to email back and forth with users to figure out what is going on when the stuff does not blow up on our unified platform code. We have had that situation with floating point on Windows (or rather 32bit platforms generally) just now. Windows-only problems are really tricky things. So I am skeptical that a unification of test platforms among developers will make it easier rather than harder to track down problems among us. >> > 2) we can test against different configurations (Pango 1.44 >> > vs. 1.36, GUILE 1.8 vs 2.2) simultaneously, which catches problems >> > like the recent Pango one earlier. >> >> That's definitely an advantage against our more haphazard setup now. >> It does come at the cost of _everyone_ (or the CI system) having to >> test _all_ pertinent configurations rather than just a personal >> sampling if it is supposed to increase the covered base. > > Nobody _has_ to test all configurations. But one _can_. If one does, the bill will come up eventually. For better or worse, LilyPond is a real pig regarding resource usage for full builds/tests. Our strategy so far has been working in a spotty manner, and with volunteers giving their computers significant workouts. That's not how you would do things in a corporate setting. But we don't have a corporate setting. -- David Kastrup My replies have a tendency to cause friction. To help mitigating damage, feel free to forward problematic posts to me adding a subject like "timeout 1d" (for a suggested timeout of 1 day) or "offensive".
