On Thu, 9 Jun 2022 12:57:05 GMT, Magnus Ihse Bursie <i...@openjdk.org> wrote:
>> With project Skara, the ability to run a set of sanity build and test jobs >> on selected platforms was added. This functionality was driven by >> `.github/workflows/submit.yml`. This file unfortunately lacks any real >> structure, and contains a lot of code duplication and redundancy. This has >> made it hard to add functionality, and new platforms to test, and it has >> made it even harder to debug issues. (This is hard enough as it is, since we >> have no direct access to the platforms that GHA runs on.) >> >> Since the GHA tests are important for a large subset of the community, we >> need to do better. >> >> ## GitHub Actions framework rewrite >> >> This is a complete overhaul of the GHA testing framework. I started out >> trying to just tease the old `submit.yml` apart, trying to de-duplicate >> code, but I soon realized a much more thorough rework was needed. >> >> ### Design description >> >> The principle for the new design was to avoid code duplication, and to >> improve readability of the code. The latter is extra important since the GHA >> "language" is very limited, needs a lot of quirks and workarounds, and is >> probably not well known by many OpenJDK developers. I've strived to find >> useful layers of abstraction to make the expressions as clear as possible. >> >> Unfortunately, the Workflow/Action YAML language is quite limited. There are >> two ways to avoid duplication, "local composite actions" and "callable >> workflows". They both have several limitations: >> >> * "Callable workflows" can only be used in a single redirection. They are >> (apparently) inlined into the "calling workflow" at run time, and as such, >> they are present without having to check out the source code. (Which is a >> lengthy process.) >> >> * "Local composite actions" can use other actions, but you must start by >> checking out the repo. >> >> To use the strength of both kinds of sub-modules, I'm using "callable >> workflows" from `main.yml` to call `build-<platform>.yml` and `test.yml`. It >> is not allowed to mix "strategies" (that is, the method of automatically >> creating a test matrix) when calling callable workflows, so I needed to have >> some amount of duplication in `main.yml` that could have been avoided >> otherwise. >> >> All the callable workflows need to check out the source code anyway, so >> there is no real additional cost of using "local composite actions" for >> abstraction of these workflows. (A bit of a lucky break.) I've created "high >> level" actions, corresponding to something like a function call. The goal >> here was both to avoid duplication, and to improve readability of the >> workflows. >> >> The four `build-<platform>.yml` files are very similar. But in the end of >> the day, only like 50% of the source code is shared, and the platform >> specific changes permeate the files. So I decided to keep them separately, >> since mixing them all into one would have made a mess, due to the lack of >> proper abstraction mechanisms. But that also mean that if we change platform >> independent code in building, we need to remember to update it in all four >> places. >> >> In the strictest sense, this is a "refactoring" in that the functionality >> should be equal to the old `submit.yml`. The same platforms should build, >> with the same arguments, and the same tests should run. When I look at the >> code now, I see lots of potential for improvement here, by rethinking what >> we do run. But let's save that discussion for the next PR. >> >> There is one major change, though. Windows is no longer running on Cygwin, >> but on MSYS2. This was not really triggered by the recurring build issues on >> Cygwin (though that certainly did help me in thinking I made the right >> choice), but the sheer impossibility of getting Cygwin to behave as a normal >> unix shell on GHA Windows hosts. I spent countless hours trying to work out >> limitations, by setting `SHELLOPTS=igncr`, by running `set +x posix` to turn >> of the POSIX compliance mode that kept turning on by itself and made bash >> choke on several of our scripts, by playing tricks with the `PATH`, but in >> the end to no avail. There were no single combination of hacks and >> workarounds that could get us past the entire chain from configure, to >> build, to testing. (The old solution user PowerShell instead to get around >> these limitations.) I'm happy to report that I have had absolutely zero >> issues with MSYS2 since I made the switch (and understood how to set the >> PATH properly), and I'm seriously c onsidering switching stance to recommend using MSYS2 instead of Cygwin as the primary winenv for building the JDK. >> >> ### Example run >> >> A good example on how a run looks like with the new GHA system is [the run >> for this PR](https://github.com/magicus/jdk/actions/runs/2454577164). >> >> ### New features >> >> While the primary focus was to convert the old system to a new framework, >> more accommodating to development, and to wait with further enhancements for >> the future, I have made a few additional features already in this PR. Most >> of them are related to needs that arose during development of this PR. >> >> * A build failure summary, similar to the recently added test failure >> summary, is added when the build step fails >> >> * The test reporting has been extended to all platforms, including Windows >> >> * Test reporting has been improved slightly, and gotten multiple bug fixes >> >> * All artifacts are now available for individual download. This includes: >> >> * The build bundles, per platform >> * The test results, per platform and test suite >> * Build failure logs, in case of build failure >> >> The build bundles have a retention period of 24 h, but the rest uses >> GitHub's default retention period (currently 90 days). The idea is that you >> can use GHA to download builds for platforms you might not have access to, >> but after that, conserving the builds does not make sense. GitHub currently >> provides free, unlimited storage (within the retention period) for >> artifacts, so we can afford this. >> >> * The GHA process starts up much faster, which mean that e.g. a build >> failure on an exotic platform will show up earlier. This will not really >> affect the overall run time though, since it is bounded by variables such as >> queuing for workers, and waiting on tests with somewhat arbitrarily run >> times to finish. >> >> ### Additional changes outside GHA >> >> I also needed to make a few tweaks to the build system to play nice with the >> new GHA code. >> >> * The build failure summary is now stored in >> build/$BUILD/make-support/failure-summary.log >> >> * The configure summary now indicates what devkit or sysroot is used, if any >> >> * The --with-sysroot argument is now properly normalized >> >> ### Test failures >> >> A handful of tests, which relies on shell behavior, turned out to fail on >> Windows when running under MSYS2. I have filed separate bugs, and submitted >> PRs, to get these fixed: >> >> * https://bugs.openjdk.org/browse/JDK-8287902 >> >> * https://bugs.openjdk.org/browse/JDK-8287895 > > Magnus Ihse Bursie has updated the pull request incrementally with one > additional commit since the last revision: > > Apparently that was not a legal reference to actions/checkout. Try another > way. This all looks very nice. I have a few inline comments. The general one is that we need to make sure: - jtreg needs to be built from its mainline at a given tag; - tests need to pass (I assume you are waiting for test fixes to land first); .github/actions/get-bootjdk/action.yml line 26: > 24: # > 25: > 26: name: 'Get BootJDK' Here and later, polishing: "BootJDK" -> "boot JDK"? .github/actions/get-bundles/action.yml line 103: > 101: jdk_dir="$(cygpath $jdk_dir)" > 102: symbols_dir="$(cygpath $symbols_dir)" > 103: tests_dir="$(cygpath $tests_dir)" Here, and maybe later: how safe it is to use Cygwin utilities like `cygpath` in MSYS2 context? .github/actions/get-jtreg/action.yml line 55: > 53: ref: 7903206-build-on-msys2 > 54: path: jtreg/src > 55: if: steps.get-cached-jtreg.outputs.cache-hit != 'true' Just leaving the comment here so we don't forget to change it to mainline jtreg location before integrating this PR :) .github/actions/get-msys/action.yml line 26: > 24: # > 25: > 26: name: 'Get msys' Here and later, polishing: call it consistently "msys2" or "MSYS2"? .github/workflows/build-cross-compile.yml line 89: > 87: sudo apt-get install gcc-${{ inputs.apt-gcc-version }} g++-${{ > inputs.apt-gcc-version }} libxrandr-dev${{ inputs.apt-architecture }} > libxtst-dev${{ inputs.apt-architecture }} libcups2-dev${{ > inputs.apt-architecture }} libasound2-dev${{ inputs.apt-architecture }} > 88: sudo update-alternatives --install /usr/bin/gcc gcc > /usr/bin/gcc-10 100 --slave /usr/bin/g++ g++ /usr/bin/g++-10 > 89: sudo apt-get install gcc-10-${{ matrix.gnu-arch }}-linux-gnu${{ > matrix.gnu-abi}}=10.3.0-1ubuntu1~20.04cross1 g++-10-${{ matrix.gnu-arch > }}-linux-gnu${{ matrix.gnu-abi}}=10.3.0-1ubuntu1~20.04cross1 Should `10.3.0-1ubuntu1~20.04cross1` also go into common "var", like `apt-gcc-version` did? .github/workflows/build-cross-compile.yml line 111: > 109: buster > 110: sysroot > 111: http://httpredir.debian.org/debian/ While we are at it, we can change to `https://` in cases like these... .github/workflows/main.yml line 138: > 136: apt-gcc-version: '10-multilib' > 137: apt-architecture: ':i386' > 138: apt-add-architecture: 'sudo dpkg --add-architecture i386' Feels a bit weird to have the entire command line here. I'd expect to see something like `apt-add-architecture: i386`. make/InitSupport.gmk line 456: > 454: $(PRINTF) "\n* All command lines available in > $(MAKESUPPORT_OUTPUTDIR)/failure-logs.\n" ; \ > 455: $(PRINTF) "=== End of repeated output ===\n" ; \ > 456: ) >> $(MAKESUPPORT_OUTPUTDIR)/failure-summary.log \ Changes in this file look like a generic build system improvement, so maybe it should be forked out as separate issue? In case this part of build system change regresses something, I would be odd to track it down to GHA workflow improvement. ------------- PR: https://git.openjdk.org/jdk/pull/9063