Thanks Nathan for the tip. I don't think it will help much, since our bottleneck is not quite Docker, but in the sheer number of builds we're running.
Good news is that everything is under control so far, thanks to everyone for your cooperation! Here's my Daily Update on our CI Situation: https://github.com/apache/nuttx/issues/14376 Lup On Tue, 22 Oct 2024, 9:18 pm Nathan Hartman, <hartman.nat...@gmail.com> wrote: > Hi folks, > > The following email was posted to builds@ today and might contain > something > relevant to reducing our GitHub runners? Forwarded message below... > > [1] > https://lists.apache.org/thread/pnvt9b80dnovlqmrf5n10ylcf9q3pcxq > > ---------- Forwarded message --------- > From: Lari Hotari <lhot...@apache.org> > Date: Tue, Oct 22, 2024 at 7:08 AM > Subject: Sharing Apache Pulsar's CI solution for Docker image sharing with > GitHub Actions Artifacts within a single workflow > To: <bui...@apache.org> > > > Hi all, > > Just in case it's useful for someone else, in Apache Pulsar, there's a > GitHub Actions-based CI workflow that creates a Docker image and runs > integration tests and system tests with it. In Pulsar, we have an extremely > large Docker image for system tests; it's over 1.7GiB when compressed with > zstd. Building this image takes over 20 minutes, so we want to share the > image within a single build workflow. GitHub Artifacts are the recommended > way to share files between jobs in a single workflow, as explained in the > GitHub Actions documentation: > > https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow > . > > To share the Docker image within a single build workflow, we use GitHub > Artifacts upload/download with a custom CLI tool that uses the > GitHub-provided JavaScript libraries for interacting with the GitHub > Artifacts backend API. The benefit of the CLI tool for GitHub Actions > Artifacts is that it can upload from stdin and download to stdout. Sharing > the Docker images in the GitHub Actions workflow is simply done with the > CLI tool and standard "docker load" and "docker save" commands. > > These are the shell script functions that Apache Pulsar uses: > > https://github.com/apache/pulsar/blob/1344167328c31ea39054ec2a6019f003fb8bab50/build/pulsar_ci_tool.sh#L82-L101 > > In Pulsar CI, the command for saving the image is: > docker save ${image} | zstd | pv -ft -i 5 | pv -Wbaf -i 5 | timeout 20m > gh-actions-artifact-client.js upload > --retentionDays=$ARTIFACT_RETENTION_DAYS "${artifactname}" > > For restoring, the command used is: > timeout 20m gh-actions-artifact-client.js download "${artifactname}" | pv > -batf -i 5 | unzstd | docker load > > The throughput is very impressive. Transfer speed can exceed 180MiB/s when > uploading the Docker image, and downloads are commonly over 100MiB/s in > apache/pulsar builds. It's notable that the transfer includes the execution > of "docker load" and "docker save" since it's directly operating on stdin > and stdout. > Examples: > upload: > > https://github.com/apache/pulsar/actions/runs/11454093832/job/31880154863#step:15:26 > download: > > https://github.com/apache/pulsar/actions/runs/11454093832/job/31880164467#step:9:20 > > Since GitHub Artifacts doesn't provide an official CLI tool, I have written > a GitHub Action for that purpose. It's available at > https://github.com/lhotari/gh-actions-artifact-client. > When you use the action, it will install the CLI tool available as > "gh-actions-artifact-client.js" in the PATH of the runner so that it's > available in subsequent build steps. In Apache Pulsar, we fork external > actions to our own repository, so we use the version forked to > https://github.com/apache/pulsar-test-infra. > > In Pulsar, we have been using this solution successfully for several years. > I recently upgraded the action to support the GitHub Actions Artifacts API > v4, as earlier API versions will be removed after November 30th. > > I hope this helps other projects that face similar CI challenges as Pulsar > has. Please let me know if you need any help in using a similar solution > for your Apache project's CI. > > -Lari > > (end of forwarded message) > > WDYT? Relevant to us? > > Cheers, > Nathan > > On Thu, Oct 17, 2024 at 2:10 AM Lee, Lup Yuen <lu...@appkaki.com> wrote: > > > Hi All: We have an ultimatum to reduce (drastically) our usage of GitHub > > Actions. Or our Continuous Integration will halt totally in Two Weeks. > > Here's what I'll implement within 24 hours for `nuttx` and `nuttx-apps` > > repos: > > > > (1) When we submit or update a Complex PR that affects All Architectures > > (Arm, RISC-V, Xtensa, etc): CI Workflow shall run only half the jobs. > > Previously CI Workflow will run `arm-01` to `arm-14`, now we will run > only > > `arm-01` to `arm-07`. (This will reduce GitHub Cost by 32%) > > > > (2) When the Complex PR is Merged: CI Workflow will still run all jobs > > `arm-01` to `arm-14` > > > > (3) For NuttX Admins: We shall have only Four Scheduled Merge Jobs per > day. > > Which means I shall quickly cancel any Merge Jobs that appear. Then at > > 00:00 / 06:00 / 12:00 / 18:00 UTC: I shall restart the Latest Merge Job > > that I cancelled. (This will reduce GitHub Cost by 17%) > > > > (4) macOS and Windows Jobs (msys2 / msvc): They shall be totally disabled > > until we find a way to manage their costs. (GitHub charges 10x premium > for > > macOS runners, 2x premium for Windows runners!) > > > > We have done an Analysis of CI Jobs over the past 24 hours: > > > > - Many CI Jobs are Incomplete: We waste GitHub Runners on jobs that > > eventually get superseded and cancelled > > > > - When we Half the CI Jobs: We reduce the wastage of GitHub Runners > > > > - Scheduled Merge Jobs will also reduce wastage of GitHub Runners, since > > most Merge Jobs don't complete (only 1 completed yesterday) > > > > Please check out the analysis below. And let's discuss further in this > > NuttX Issue. Thanks! > > > > https://github.com/apache/nuttx/issues/14376 > > > > Lup > > > > > > >> ---------- Forwarded message --------- > > >> From: Daniel Gruno <humbed...@apache.org> > > >> Date: Wed, Oct 16, 2024 at 12:08 PM > > >> Subject: [WARNING] All NuttX builds to be turned off by October 30th > > >> UNLESS... > > >> To: <priv...@nuttx.apache.org> > > >> Cc: ASF Infrastructure <priv...@infra.apache.org> > > >> > > >> > > >> Hello again, NuttX folks. > > >> This is a formal notice that your CI builds are far exceeding the > > >> maximum resource use set out by our CI policies[1]. As you are > currently > > >> exceeding your limits by more than 300%[2] and have not shown any > signs > > >> of decreasing, we will be disabling GitHub Actions for your project on > > >> October 30th unless you manage to get the usage under control and > below > > >> the established limits of 25 full-time runners in a single week. > > >> > > >> If you have any further questions, feel free to reach out to us at > > >> priv...@infra.apache.org > > >> > > >> With regards, > > >> Daniel on behalf of ASF Infra. > > >> > > >> > > >> [1] https://infra.apache.org/github-actions-policy.html > > >> [2] https://infra-reports.apache.org/#ghactions&project=nuttx > > >> > > > > > > > >