On Tue, 7 Jun 2022 13:05:49 GMT, Magnus Ihse Bursie <i...@openjdk.org> wrote:
> With project Skara, the ability to run a set of sanity build and test jobs on > selected platforms was added. This functionality was driven by > `.github/workflows/submit.yml`. This file unfortunately lacks any real > structure, and contains a lot of code duplication and redundancy. This has > made it hard to add functionality, and new platforms to test, and it has made > it even harder to debug issues. (This is hard enough as it is, since we have > no direct access to the platforms that GHA runs on.) > > Since the GHA tests are important for a large subset of the community, we > need to do better. > > ## GitHub Actions framework rewrite > > This is a complete overhaul of the GHA testing framework. I started out > trying to just tease the old `submit.yml` apart, trying to de-duplicate code, > but I soon realized a much more thorough rework was needed. > > ### Design description > > The principle for the new design was to avoid code duplication, and to > improve readability of the code. The latter is extra important since the GHA > "language" is very limited, needs a lot of quirks and workarounds, and is > probably not well known by many OpenJDK developers. I've strived to find > useful layers of abstraction to make the expressions as clear as possible. > > Unfortunately, the Workflow/Action YAML language is quite limited. There are > two ways to avoid duplication, "local composite actions" and "callable > workflows". They both have several limitations: > > * "Callable workflows" can only be used in a single redirection. They are > (apparently) inlined into the "calling workflow" at run time, and as such, > they are present without having to check out the source code. (Which is a > lengthy process.) > > * "Local composite actions" can use other actions, but you must start by > checking out the repo. > > To use the strength of both kinds of sub-modules, I'm using "callable > workflows" from `main.yml` to call `build-<platform>.yml` and `test.yml`. It > is not allowed to mix "strategies" (that is, the method of automatically > creating a test matrix) when calling callable workflows, so I needed to have > some amount of duplication in `main.yml` that could have been avoided > otherwise. > > All the callable workflows need to check out the source code anyway, so there > is no real additional cost of using "local composite actions" for abstraction > of these workflows. (A bit of a lucky break.) I've created "high level" > actions, corresponding to something like a function call. The goal here was > both to avoid duplication, and to improve readability of the workflows. > > The four `build-<platform>.yml` files are very similar. But in the end of the > day, only like 50% of the source code is shared, and the platform specific > changes permeate the files. So I decided to keep them separately, since > mixing them all into one would have made a mess, due to the lack of proper > abstraction mechanisms. But that also mean that if we change platform > independent code in building, we need to remember to update it in all four > places. > > In the strictest sense, this is a "refactoring" in that the functionality > should be equal to the old `submit.yml`. The same platforms should build, > with the same arguments, and the same tests should run. When I look at the > code now, I see lots of potential for improvement here, by rethinking what we > do run. But let's save that discussion for the next PR. > > There is one major change, though. Windows is no longer running on Cygwin, > but on MSYS2. This was not really triggered by the recurring build issues on > Cygwin (though that certainly did help me in thinking I made the right > choice), but the sheer impossibility of getting Cygwin to behave as a normal > unix shell on GHA Windows hosts. I spent countless hours trying to work out > limitations, by setting `SHELLOPTS=igncr`, by running `set +x posix` to turn > of the POSIX compliance mode that kept turning on by itself and made bash > choke on several of our scripts, by playing tricks with the `PATH`, but in > the end to no avail. There were no single combination of hacks and > workarounds that could get us past the entire chain from configure, to build, > to testing. (The old solution user PowerShell instead to get around these > limitations.) I'm happy to report that I have had absolutely zero issues with > MSYS2 since I made the switch (and understood how to set the PATH properly), > and I'm seriously co nsidering switching stance to recommend using MSYS2 instead of Cygwin as the primary winenv for building the JDK. > > ### Example run > > A good example on how a run looks like with the new GHA system is [the run > for this PR](https://github.com/magicus/jdk/actions/runs/2454577164). > > ### New features > > While the primary focus was to convert the old system to a new framework, > more accommodating to development, and to wait with further enhancements for > the future, I have made a few additional features already in this PR. Most of > them are related to needs that arose during development of this PR. > > * A build failure summary, similar to the recently added test failure > summary, is added when the build step fails > > * The test reporting has been extended to all platforms, including Windows > > * Test reporting has been improved slightly, and gotten multiple bug fixes > > * All artifacts are now available for individual download. This includes: > > * The build bundles, per platform > * The test results, per platform and test suite > * Build failure logs, in case of build failure > > The build bundles have a retention period of 24 h, but the rest uses > GitHub's default retention period (currently 90 days). The idea is that you > can use GHA to download builds for platforms you might not have access to, > but after that, conserving the builds does not make sense. GitHub currently > provides free, unlimited storage (within the retention period) for artifacts, > so we can afford this. > > * The GHA process starts up much faster, which mean that e.g. a build failure > on an exotic platform will show up earlier. This will not really affect the > overall run time though, since it is bounded by variables such as queuing for > workers, and waiting on tests with somewhat arbitrarily run times to finish. > > ### Additional changes outside GHA > > I also needed to make a few tweaks to the build system to play nice with the > new GHA code. > > * The build failure summary is now stored in > build/$BUILD/make-support/failure-summary.log > > * The configure summary now indicates what devkit or sysroot is used, if any > > * The --with-sysroot argument is now properly normalized > > ### Test failures > > A handful of tests, which relies on shell behavior, turned out to fail on > Windows when running under MSYS2. I have filed separate bugs, and submitted > PRs, to get these fixed: > > * https://bugs.openjdk.org/browse/JDK-8287902 > > * https://bugs.openjdk.org/browse/JDK-8287895 make/conf/github-actions.conf line 31: > 29: > 30: > JTREG_URL=https://ci.adoptopenjdk.net/view/Dependencies/job/dependency_pipeline/330/artifact/jtreg/jtreg-6.1+1.tar.gz > 31: > JTREG_SHA256=ccfa21f54bb173f818a5a8d93f77d49301f275f0677c9f914297046c910c5129 This seems questionable, and adds a very suspect blocking dependency on that website for when we want to update the version of jtreg to be used. ------------- PR: https://git.openjdk.java.net/jdk/pull/9063