Re: RFR: 8287906: Rewrite of GitHub Actions (GHA) sanity tests

Jonathan Gibbons Tue, 07 Jun 2022 13:10:22 -0700

On Tue, 7 Jun 2022 13:05:49 GMT, Magnus Ihse Bursie <[email protected]> wrote:


> With project Skara, the ability to run a set of sanity build and test jobs on 
> selected platforms was added. This functionality was driven by 
> `.github/workflows/submit.yml`. This file unfortunately lacks any real 
> structure, and contains a lot of code duplication and redundancy. This has 
> made it hard to add functionality, and new platforms to test, and it has made 
> it even harder to debug issues. (This is hard enough as it is, since we have 
> no direct access to the platforms that GHA runs on.)
> 
> Since the GHA tests are important for a large subset of the community, we 
> need to do better. 
> 
> ## GitHub Actions framework rewrite
>  
> This is a complete overhaul of the GHA testing framework. I started out 
> trying to just tease the old `submit.yml` apart, trying to de-duplicate code, 
> but I soon realized a much more thorough rework was needed.
> 
> ### Design description
> 
> The principle for the new design was to avoid code duplication, and to 
> improve readability of the code. The latter is extra important since the GHA 
> "language" is very limited, needs a lot of quirks and workarounds, and is 
> probably not well known by many OpenJDK developers. I've strived to find 
> useful layers of abstraction to make the expressions as clear as possible.
> 
> Unfortunately, the Workflow/Action YAML language is quite limited. There are 
> two ways to avoid duplication, "local composite actions" and "callable 
> workflows". They both have several limitations:
> 
>  * "Callable workflows" can only be used in a single redirection. They are 
> (apparently) inlined into the "calling workflow" at run time, and as such, 
> they are present without having to check out the source code. (Which is a 
> lengthy process.)
> 
>  * "Local composite actions" can use other actions, but you must start by 
> checking out the repo.
> 
> To use the strength of both kinds of sub-modules, I'm using "callable 
> workflows" from `main.yml` to call `build-<platform>.yml` and `test.yml`. It 
> is not allowed to mix "strategies" (that is, the method of automatically 
> creating a test matrix) when calling callable workflows, so I needed to have 
> some amount of duplication in `main.yml` that could have been avoided 
> otherwise.
> 
> All the callable workflows need to check out the source code anyway, so there 
> is no real additional cost of using "local composite actions" for abstraction 
> of these workflows. (A bit of a lucky break.) I've created "high level" 
> actions, corresponding to something like a function call. The goal here was 
> both to avoid duplication, and to improve readability of the workflows.
> 
> The four `build-<platform>.yml` files are very similar. But in the end of the 
> day, only like 50% of the source code is shared, and the platform specific 
> changes permeate the files. So I decided to keep them separately, since 
> mixing them all into one would have made a mess, due to the lack of proper 
> abstraction mechanisms. But that also mean that if we change platform 
> independent code in building, we need to remember to update it in all four 
> places.
> 
> In the strictest sense, this is a "refactoring" in that the functionality 
> should be equal to the old `submit.yml`. The same platforms should build, 
> with the same arguments, and the same tests should run. When I look at the 
> code now, I see lots of potential for improvement here, by rethinking what we 
> do run. But let's save that discussion for the next PR.
> 
> There is one major change, though. Windows is no longer running on Cygwin, 
> but on MSYS2. This was not really triggered by the recurring build issues on 
> Cygwin (though that certainly did help me in thinking I made the right 
> choice), but the sheer impossibility of getting Cygwin to behave as a normal 
> unix shell on GHA Windows hosts. I spent countless hours trying to work out 
> limitations, by setting `SHELLOPTS=igncr`, by running `set +x posix` to turn 
> of the POSIX compliance mode that kept turning on by itself and made bash 
> choke on several of our scripts, by playing tricks with the `PATH`, but in 
> the end to no avail. There were no single combination of hacks and 
> workarounds that could get us past the entire chain from configure, to build, 
> to testing. (The old solution user PowerShell instead to get around these 
> limitations.) I'm happy to report that I have had absolutely zero issues with 
> MSYS2 since I made the switch (and understood how to set the PATH properly), 
> and I'm seriously co
 nsidering switching stance to recommend using MSYS2 instead of Cygwin as the 
primary winenv for building the JDK.
> 
> ### Example run
> 
> A good example on how a run looks like with the new GHA system is [the run 
> for this PR](https://github.com/magicus/jdk/actions/runs/2454577164).
> 
> ### New features
> 
> While the primary focus was to convert the old system to a new framework, 
> more accommodating to development, and to wait with further enhancements for 
> the future, I have made a few additional features already in this PR. Most of 
> them are related to needs that arose during development of this PR.
> 
> * A build failure summary, similar to the recently added test failure 
> summary, is added when the build step fails
> 
> * The test reporting has been extended to all platforms, including Windows
> 
> * Test reporting has been improved slightly, and gotten multiple bug fixes
> 
> * All artifacts are now available for individual download. This includes:
> 
>   * The build bundles, per platform
>   * The test results, per platform and test suite
>   * Build failure logs, in case of build failure
> 
>   The build bundles have a retention period of 24 h, but the rest uses 
> GitHub's default retention period (currently 90 days). The idea is that you 
> can use GHA to download builds for platforms you might not have access to, 
> but after that, conserving the builds does not make sense. GitHub currently 
> provides free, unlimited storage (within the retention period) for artifacts, 
> so we can afford this.
> 
> * The GHA process starts up much faster, which mean that e.g. a build failure 
> on an exotic platform will show up earlier. This will not really affect the 
> overall run time though, since it is bounded by variables such as queuing for 
> workers, and waiting on tests with somewhat arbitrarily run times to finish.
> 
> ### Additional changes outside GHA
> 
> I also needed to make a few tweaks to the build system to play nice with the 
> new GHA code.
> 
> * The build failure summary is now stored in 
> build/$BUILD/make-support/failure-summary.log
> 
> * The configure summary now indicates what devkit or sysroot is used, if any
> 
> * The --with-sysroot argument is now properly normalized
> 
> ### Test failures
> 
> A handful of tests, which relies on shell behavior, turned out to fail on 
> Windows when running under MSYS2. I have filed separate bugs, and submitted 
> PRs, to get these fixed:
> 
> * https://bugs.openjdk.org/browse/JDK-8287902
> 
> * https://bugs.openjdk.org/browse/JDK-8287895

make/conf/github-actions.conf line 31:

> 29: 
> 30: 
> JTREG_URL=https://ci.adoptopenjdk.net/view/Dependencies/job/dependency_pipeline/330/artifact/jtreg/jtreg-6.1+1.tar.gz
> 31: 
> JTREG_SHA256=ccfa21f54bb173f818a5a8d93f77d49301f275f0677c9f914297046c910c5129

This seems questionable, and adds a very suspect blocking dependency on that 
website for when we want to update the version of jtreg to be used.

-------------

PR: https://git.openjdk.java.net/jdk/pull/9063

Re: RFR: 8287906: Rewrite of GitHub Actions (GHA) sanity tests

Reply via email to