On Fri, 10 Jun 2022 09:34:53 GMT, Magnus Ihse Bursie <i...@openjdk.org> wrote:
>> With project Skara, the ability to run a set of sanity build and test jobs >> on selected platforms was added. This functionality was driven by >> `.github/workflows/submit.yml`. This file unfortunately lacks any real >> structure, and contains a lot of code duplication and redundancy. This has >> made it hard to add functionality, and new platforms to test, and it has >> made it even harder to debug issues. (This is hard enough as it is, since we >> have no direct access to the platforms that GHA runs on.) >> >> Since the GHA tests are important for a large subset of the community, we >> need to do better. >> >> ## GitHub Actions framework rewrite >> >> This is a complete overhaul of the GHA testing framework. I started out >> trying to just tease the old `submit.yml` apart, trying to de-duplicate >> code, but I soon realized a much more thorough rework was needed. >> >> ### Design description >> >> The principle for the new design was to avoid code duplication, and to >> improve readability of the code. The latter is extra important since the GHA >> "language" is very limited, needs a lot of quirks and workarounds, and is >> probably not well known by many OpenJDK developers. I've strived to find >> useful layers of abstraction to make the expressions as clear as possible. >> >> Unfortunately, the Workflow/Action YAML language is quite limited. There are >> two ways to avoid duplication, "local composite actions" and "callable >> workflows". They both have several limitations: >> >> * "Callable workflows" can only be used in a single redirection. They are >> (apparently) inlined into the "calling workflow" at run time, and as such, >> they are present without having to check out the source code. (Which is a >> lengthy process.) >> >> * "Local composite actions" can use other actions, but you must start by >> checking out the repo. >> >> To use the strength of both kinds of sub-modules, I'm using "callable >> workflows" from `main.yml` to call `build-<platform>.yml` and `test.yml`. It >> is not allowed to mix "strategies" (that is, the method of automatically >> creating a test matrix) when calling callable workflows, so I needed to have >> some amount of duplication in `main.yml` that could have been avoided >> otherwise. >> >> All the callable workflows need to check out the source code anyway, so >> there is no real additional cost of using "local composite actions" for >> abstraction of these workflows. (A bit of a lucky break.) I've created "high >> level" actions, corresponding to something like a function call. The goal >> here was both to avoid duplication, and to improve readability of the >> workflows. >> >> The four `build-<platform>.yml` files are very similar. But in the end of >> the day, only like 50% of the source code is shared, and the platform >> specific changes permeate the files. So I decided to keep them separately, >> since mixing them all into one would have made a mess, due to the lack of >> proper abstraction mechanisms. But that also mean that if we change platform >> independent code in building, we need to remember to update it in all four >> places. >> >> In the strictest sense, this is a "refactoring" in that the functionality >> should be equal to the old `submit.yml`. The same platforms should build, >> with the same arguments, and the same tests should run. When I look at the >> code now, I see lots of potential for improvement here, by rethinking what >> we do run. But let's save that discussion for the next PR. >> >> There is one major change, though. Windows is no longer running on Cygwin, >> but on MSYS2. This was not really triggered by the recurring build issues on >> Cygwin (though that certainly did help me in thinking I made the right >> choice), but the sheer impossibility of getting Cygwin to behave as a normal >> unix shell on GHA Windows hosts. I spent countless hours trying to work out >> limitations, by setting `SHELLOPTS=igncr`, by running `set +x posix` to turn >> of the POSIX compliance mode that kept turning on by itself and made bash >> choke on several of our scripts, by playing tricks with the `PATH`, but in >> the end to no avail. There were no single combination of hacks and >> workarounds that could get us past the entire chain from configure, to >> build, to testing. (The old solution user PowerShell instead to get around >> these limitations.) I'm happy to report that I have had absolutely zero >> issues with MSYS2 since I made the switch (and understood how to set the >> PATH properly), and I'm seriously c onsidering switching stance to recommend using MSYS2 instead of Cygwin as the primary winenv for building the JDK. >> >> ### Example run >> >> A good example on how a run looks like with the new GHA system is [the run >> for this PR](https://github.com/magicus/jdk/actions/runs/2454577164). >> >> ### New features >> >> While the primary focus was to convert the old system to a new framework, >> more accommodating to development, and to wait with further enhancements for >> the future, I have made a few additional features already in this PR. Most >> of them are related to needs that arose during development of this PR. >> >> * A build failure summary, similar to the recently added test failure >> summary, is added when the build step fails >> >> * The test reporting has been extended to all platforms, including Windows >> >> * Test reporting has been improved slightly, and gotten multiple bug fixes >> >> * All artifacts are now available for individual download. This includes: >> >> * The build bundles, per platform >> * The test results, per platform and test suite >> * Build failure logs, in case of build failure >> >> The build bundles have a retention period of 24 h, but the rest uses >> GitHub's default retention period (currently 90 days). The idea is that you >> can use GHA to download builds for platforms you might not have access to, >> but after that, conserving the builds does not make sense. GitHub currently >> provides free, unlimited storage (within the retention period) for >> artifacts, so we can afford this. >> >> * The GHA process starts up much faster, which mean that e.g. a build >> failure on an exotic platform will show up earlier. This will not really >> affect the overall run time though, since it is bounded by variables such as >> queuing for workers, and waiting on tests with somewhat arbitrarily run >> times to finish. >> >> ### Additional changes outside GHA >> >> I also needed to make a few tweaks to the build system to play nice with the >> new GHA code. >> >> * The build failure summary is now stored in >> build/$BUILD/make-support/failure-summary.log >> >> * The configure summary now indicates what devkit or sysroot is used, if any >> >> * The --with-sysroot argument is now properly normalized >> >> ### Test failures >> >> A handful of tests, which relies on shell behavior, turned out to fail on >> Windows when running under MSYS2. I have filed separate bugs, and submitted >> PRs, to get these fixed: >> >> * https://bugs.openjdk.org/browse/JDK-8287902 >> >> * https://bugs.openjdk.org/browse/JDK-8287895 > > Magnus Ihse Bursie has updated the pull request incrementally with one > additional commit since the last revision: > > Fix test failure regex The jtreg changes to build with MSYS2 should now be available in the main jtreg repo. ------------- PR: https://git.openjdk.org/jdk/pull/9063