On Thu, 9 Jun 2022 12:57:05 GMT, Magnus Ihse Bursie <i...@openjdk.org> wrote:

>> With project Skara, the ability to run a set of sanity build and test jobs 
>> on selected platforms was added. This functionality was driven by 
>> `.github/workflows/submit.yml`. This file unfortunately lacks any real 
>> structure, and contains a lot of code duplication and redundancy. This has 
>> made it hard to add functionality, and new platforms to test, and it has 
>> made it even harder to debug issues. (This is hard enough as it is, since we 
>> have no direct access to the platforms that GHA runs on.)
>> 
>> Since the GHA tests are important for a large subset of the community, we 
>> need to do better. 
>> 
>> ## GitHub Actions framework rewrite
>>  
>> This is a complete overhaul of the GHA testing framework. I started out 
>> trying to just tease the old `submit.yml` apart, trying to de-duplicate 
>> code, but I soon realized a much more thorough rework was needed.
>> 
>> ### Design description
>> 
>> The principle for the new design was to avoid code duplication, and to 
>> improve readability of the code. The latter is extra important since the GHA 
>> "language" is very limited, needs a lot of quirks and workarounds, and is 
>> probably not well known by many OpenJDK developers. I've strived to find 
>> useful layers of abstraction to make the expressions as clear as possible.
>> 
>> Unfortunately, the Workflow/Action YAML language is quite limited. There are 
>> two ways to avoid duplication, "local composite actions" and "callable 
>> workflows". They both have several limitations:
>> 
>>  * "Callable workflows" can only be used in a single redirection. They are 
>> (apparently) inlined into the "calling workflow" at run time, and as such, 
>> they are present without having to check out the source code. (Which is a 
>> lengthy process.)
>> 
>>  * "Local composite actions" can use other actions, but you must start by 
>> checking out the repo.
>> 
>> To use the strength of both kinds of sub-modules, I'm using "callable 
>> workflows" from `main.yml` to call `build-<platform>.yml` and `test.yml`. It 
>> is not allowed to mix "strategies" (that is, the method of automatically 
>> creating a test matrix) when calling callable workflows, so I needed to have 
>> some amount of duplication in `main.yml` that could have been avoided 
>> otherwise.
>> 
>> All the callable workflows need to check out the source code anyway, so 
>> there is no real additional cost of using "local composite actions" for 
>> abstraction of these workflows. (A bit of a lucky break.) I've created "high 
>> level" actions, corresponding to something like a function call. The goal 
>> here was both to avoid duplication, and to improve readability of the 
>> workflows.
>> 
>> The four `build-<platform>.yml` files are very similar. But in the end of 
>> the day, only like 50% of the source code is shared, and the platform 
>> specific changes permeate the files. So I decided to keep them separately, 
>> since mixing them all into one would have made a mess, due to the lack of 
>> proper abstraction mechanisms. But that also mean that if we change platform 
>> independent code in building, we need to remember to update it in all four 
>> places.
>> 
>> In the strictest sense, this is a "refactoring" in that the functionality 
>> should be equal to the old `submit.yml`. The same platforms should build, 
>> with the same arguments, and the same tests should run. When I look at the 
>> code now, I see lots of potential for improvement here, by rethinking what 
>> we do run. But let's save that discussion for the next PR.
>> 
>> There is one major change, though. Windows is no longer running on Cygwin, 
>> but on MSYS2. This was not really triggered by the recurring build issues on 
>> Cygwin (though that certainly did help me in thinking I made the right 
>> choice), but the sheer impossibility of getting Cygwin to behave as a normal 
>> unix shell on GHA Windows hosts. I spent countless hours trying to work out 
>> limitations, by setting `SHELLOPTS=igncr`, by running `set +x posix` to turn 
>> of the POSIX compliance mode that kept turning on by itself and made bash 
>> choke on several of our scripts, by playing tricks with the `PATH`, but in 
>> the end to no avail. There were no single combination of hacks and 
>> workarounds that could get us past the entire chain from configure, to 
>> build, to testing. (The old solution user PowerShell instead to get around 
>> these limitations.) I'm happy to report that I have had absolutely zero 
>> issues with MSYS2 since I made the switch (and understood how to set the 
>> PATH properly), and I'm seriously c
 onsidering switching stance to recommend using MSYS2 instead of Cygwin as the 
primary winenv for building the JDK.
>> 
>> ### Example run
>> 
>> A good example on how a run looks like with the new GHA system is [the run 
>> for this PR](https://github.com/magicus/jdk/actions/runs/2454577164).
>> 
>> ### New features
>> 
>> While the primary focus was to convert the old system to a new framework, 
>> more accommodating to development, and to wait with further enhancements for 
>> the future, I have made a few additional features already in this PR. Most 
>> of them are related to needs that arose during development of this PR.
>> 
>> * A build failure summary, similar to the recently added test failure 
>> summary, is added when the build step fails
>> 
>> * The test reporting has been extended to all platforms, including Windows
>> 
>> * Test reporting has been improved slightly, and gotten multiple bug fixes
>> 
>> * All artifacts are now available for individual download. This includes:
>> 
>>   * The build bundles, per platform
>>   * The test results, per platform and test suite
>>   * Build failure logs, in case of build failure
>> 
>>   The build bundles have a retention period of 24 h, but the rest uses 
>> GitHub's default retention period (currently 90 days). The idea is that you 
>> can use GHA to download builds for platforms you might not have access to, 
>> but after that, conserving the builds does not make sense. GitHub currently 
>> provides free, unlimited storage (within the retention period) for 
>> artifacts, so we can afford this.
>> 
>> * The GHA process starts up much faster, which mean that e.g. a build 
>> failure on an exotic platform will show up earlier. This will not really 
>> affect the overall run time though, since it is bounded by variables such as 
>> queuing for workers, and waiting on tests with somewhat arbitrarily run 
>> times to finish.
>> 
>> ### Additional changes outside GHA
>> 
>> I also needed to make a few tweaks to the build system to play nice with the 
>> new GHA code.
>> 
>> * The build failure summary is now stored in 
>> build/$BUILD/make-support/failure-summary.log
>> 
>> * The configure summary now indicates what devkit or sysroot is used, if any
>> 
>> * The --with-sysroot argument is now properly normalized
>> 
>> ### Test failures
>> 
>> A handful of tests, which relies on shell behavior, turned out to fail on 
>> Windows when running under MSYS2. I have filed separate bugs, and submitted 
>> PRs, to get these fixed:
>> 
>> * https://bugs.openjdk.org/browse/JDK-8287902
>> 
>> * https://bugs.openjdk.org/browse/JDK-8287895
>
> Magnus Ihse Bursie has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Apparently that was not a legal reference to actions/checkout. Try another 
> way.

This all looks very nice. I have a few inline comments.

The general one is that we need to make sure:
 - jtreg needs to be built from its mainline at a given tag;
 - tests need to pass (I assume you are waiting for test fixes to land first);

.github/actions/get-bootjdk/action.yml line 26:

> 24: #
> 25: 
> 26: name: 'Get BootJDK'

Here and later, polishing: "BootJDK" -> "boot JDK"?

.github/actions/get-bundles/action.yml line 103:

> 101:           jdk_dir="$(cygpath $jdk_dir)"
> 102:           symbols_dir="$(cygpath $symbols_dir)"
> 103:           tests_dir="$(cygpath $tests_dir)"

Here, and maybe later: how safe it is to use Cygwin utilities like `cygpath` in 
MSYS2 context?

.github/actions/get-jtreg/action.yml line 55:

> 53:         ref: 7903206-build-on-msys2
> 54:         path: jtreg/src
> 55:       if: steps.get-cached-jtreg.outputs.cache-hit != 'true'

Just leaving the comment here so we don't forget to change it to mainline jtreg 
location before integrating this PR :)

.github/actions/get-msys/action.yml line 26:

> 24: #
> 25: 
> 26: name: 'Get msys'

Here and later, polishing: call it consistently "msys2" or "MSYS2"?

.github/workflows/build-cross-compile.yml line 89:

> 87:           sudo apt-get install gcc-${{ inputs.apt-gcc-version }} g++-${{ 
> inputs.apt-gcc-version }} libxrandr-dev${{ inputs.apt-architecture }} 
> libxtst-dev${{ inputs.apt-architecture }} libcups2-dev${{ 
> inputs.apt-architecture }} libasound2-dev${{ inputs.apt-architecture }}
> 88:           sudo update-alternatives --install /usr/bin/gcc gcc 
> /usr/bin/gcc-10 100 --slave /usr/bin/g++ g++ /usr/bin/g++-10
> 89:           sudo apt-get install gcc-10-${{ matrix.gnu-arch }}-linux-gnu${{ 
> matrix.gnu-abi}}=10.3.0-1ubuntu1~20.04cross1 g++-10-${{ matrix.gnu-arch 
> }}-linux-gnu${{ matrix.gnu-abi}}=10.3.0-1ubuntu1~20.04cross1

Should `10.3.0-1ubuntu1~20.04cross1` also go into common "var", like 
`apt-gcc-version` did?

.github/workflows/build-cross-compile.yml line 111:

> 109:           buster
> 110:           sysroot
> 111:           http://httpredir.debian.org/debian/

While we are at it, we can change to `https://` in cases like these...

.github/workflows/main.yml line 138:

> 136:       apt-gcc-version: '10-multilib'
> 137:       apt-architecture: ':i386'
> 138:       apt-add-architecture: 'sudo dpkg --add-architecture i386'

Feels a bit weird to have the entire command line here. I'd expect to see 
something like `apt-add-architecture: i386`.

make/InitSupport.gmk line 456:

> 454:        $(PRINTF) "\n* All command lines available in 
> $(MAKESUPPORT_OUTPUTDIR)/failure-logs.\n" ; \
> 455:        $(PRINTF) "=== End of repeated output ===\n" ; \
> 456:      )  >> $(MAKESUPPORT_OUTPUTDIR)/failure-summary.log  \

Changes in this file look like a generic build system improvement, so maybe it 
should be forked out as separate issue? In case this part of build system 
change regresses something, I would be odd to track it down to GHA workflow 
improvement.

-------------

PR: https://git.openjdk.org/jdk/pull/9063

Reply via email to