Hi All: We're modifying NuttX CI (Continuous Integration) and GitHub Actions, to comply with ASF Policy. Unfortunately, these changes will extend the Build Duration for a NuttX Pull Request by roughly 15 mins, from 2 hours to 2.25 hours.
Lemme explain: Right now, every NuttX Pull Request will trigger 24 Concurrent Jobs (GitHub Runners), executing them in parallel: https://lupyuen.github.io/articles/ci According to ASF Policy: We should run at most 15 Concurrent Jobs: https://infra.apache.org/github-actions-policy.html Thus we'll cut down the Concurrent Jobs from 24 down to 15. That's 12 Linux Jobs, 2 macOS, 1 Windows. (Each job takes 30 mins to 2 hours) For Phase 1: https://lupyuen.github.io/articles/ci#appendix-phase-1-of-ci-upgrade (1) Right now our "Linux > Strategy" is a flat list of 20 Linux Jobs, all executed in parallel... matrix: boards: [arm-01, arm-02, arm-03, arm-04, arm-05, arm-06, arm-07, arm-08, arm-09, arm-10, arm-11, arm-12, arm-13, other, risc-v-01, risc-v-02, sim-01, sim-02, xtensa-01, xtensa-02] (2) We change "Linux > Strategy" to prioritise by Target Architecture, and limit to 12 concurrent jobs... max-parallel: 12 matrix: boards: [ arm-01, other, risc-v-01, sim-01, xtensa-01, arm-02, risc-v-02, sim-02, xtensa-02, arm-03, arm-04, arm-05, arm-06, arm-07, arm-08, arm-09, arm-10, arm-11, arm-12, arm-13 ] (3) So NuttX CI will initially execute 12 Build Jobs across Arm32, Arm64, RISC-V, Simulator and Xtensa. As they complete, NuttX CI will execute the remaining 8 Build Jobs (for Arm32). (4) This will extend the Overall Build Duration from 2 hours to 2.25 hours (link above) (5) We also limit macOS Jobs to 2, Windows Jobs to 1. Here's the Draft PR, please lemme know what you think: https://github.com/apache/nuttx/pull/13412 For Phase 2: https://lupyuen.github.io/articles/ci#appendix-phase-2-of-ci-upgrade We should "rebalance" the Build Targets. Move the Newer or Higher Priority or Riskier Targets to arm-01, risc-v-01, sim-01, xtensa-01. Hopefully this will allow NuttX CI to Fail Faster (for breaking changes), and prevent unnecessary builds (also reduce waiting time). For Phase 3: https://lupyuen.github.io/articles/ci#appendix-phase-3-of-ci-upgrade We should migrate most of the NuttX Targets to a Daily Job for Build and Test. Please check out the discussion below. Lup On Wed, Sep 11, 2024 at 11:02 PM Lee, Lup Yuen <lu...@appkaki.com> wrote: > << For PRs, depending on the directory (directories) of the modified > file(s), pick and choose which tests to run. >> > > Thanks Nathan for the cool ideas! I was thinking: If the Modified Source > File is shared by Multiple Targets, then which NuttX Target do we build and > test? > > Maybe we could take the NuttX Target ELF, and `objdump` the Arm / RISC-V > Disassembly, producing the Source Pathnames. Then we could figure out which > NuttX Target depends on which Source File? > > << In addition to GitHub Actions, the ASF also offers BuildBot and > Jenkins. In order to continue testing all ~1600 configurations every day, > we could adopt one of these systems to make one full test run nightly. >> > > Yep we could rewrite our GitHub Actions workflows for BuildBot and > Jenkins. Alternatively: Could we exploit this GitHub Actions Loophole... > > Suppose we fork the NuttX Repo into our Personal GitHub Accounts. Any > GitHub Runners that we trigger in our repo, will NOT be counted in the ASF > Quota for GitHub Runners! > > So we could have a bunch of Personal GitHub Accounts running NuttX Builds > and Tests every day. We could create a system to distribute / scatter / > gather the Builds and Tests across the "crowd-sourced" accounts? > > Lup > > On Wed, Sep 11, 2024 at 4:17 AM Nathan Hartman <hartman.nat...@gmail.com> > wrote: > >> Thank you Lup! >> >> I have been thinking about ways to reduce the number of builds and compute >> costs while still getting good (or even better) test coverage: >> >> For PRs, depending on the directory (directories) of the modified file(s), >> pick and choose which tests to run. We already do this for Documentation >> vs >> all others, so it could be expanded to make the logic more fine-grained. >> >> Obviously things like sched and upper half drivers can affect all builds, >> but not all PRs touch those. >> >> Many PRs fix a specific architecture or board related issue. >> >> Whenever a PR is limited to a board directory, only the configs in that >> directory should be tested. >> >> When a PR is limited to an arch, we could run tests for all boards in that >> arch but that seems wasteful. Perhaps we could choose one board from each >> arch and only test it? It would have to be the most feature-packed board >> in >> that arch to get acceptable test coverage. As a special case, if a PR >> affects both an arch and a board within that arch, test the affected >> board. >> >> Another idea is perhaps use some kind of round robin approach: test only >> one board per PR test run, but use a different board each time. Eventually >> all boards get tested. Yes, I know that issues won't be caught >> immediately, >> but the commit range will be known (within the last ~1600 PRs merged) and >> git bisect can find the specific commit with only a few tests. This is a >> cost/benefit decision. Also, see below: >> >> 2.) In addition to GitHub Actions, the ASF also offers BuildBot and >> Jenkins. >> >> In order to continue testing all ~1600 configurations every day, we could >> adopt one of these systems to make one full test run nightly. >> >> This way, instead of running ~1600 builds multiple times per day at high >> cost (and, for those builds that end up being redundant with no effective >> difference, high cost and low benefit), we could instead run all the >> configurations once per day, get virtually the same amount of benefit, and >> greatly reduce the compute cost. >> >> We could pick an off-peak time of day for the tests. Or, we could ask >> Infra >> when Jenkins or BuildBot tend to be quite and schedule our "nightly" >> (could >> be morning or afternoon depending on where you live) tests for that time. >> >> One downside to this approach is that some broken PRs may be merged and >> not >> caught until the next day, so we may have a little bit of breakage. It >> remains to be seen how much impact that could actually cause. This can be >> addressed in various ways, which we can discuss if it becomes a problem in >> practice. >> >> Thoughts? >> >> Cheers, >> Nathan >> >> On Tue, Sep 10, 2024 at 9:51 AM Lee, Lup Yuen <lu...@appkaki.com> wrote: >> >> > This article explains how we're running Continuous Integration with >> GitHub >> > Actions. Every NuttX Pull Request will trigger 1,594 NuttX Builds! >> > >> > https://lupyuen.codeberg.page/articles/ci.html >> > >> > In my next message: I'll discuss how we might cut down the NuttX Builds >> for >> > Continuous Integration. Stay Tuned! >> > >> > Lup >> > >> >