kevingurney opened a new pull request, #326: URL: https://github.com/apache/arrow-site/pull/326
# Overview This pull request modifies the `apache/arrow-site` website deployment workflow (`.github/workflows/deploy.yml`) to run inside of an `ubnutu:latest` container in order to resolve the build issue described in #325. Running inside of a container should help to avoid unexpected breaking changes to dependencies that can occur when depending on the proprietary GitHub runner image `ubuntu-latest`. In addition, using a container as the workflow environment means that developers can theoretically more easily reproduce the CI behavior locally by running their own containers in their development environment. # Qualification To qualify these changes, I: 1. Submitted these changes to the `main` branch of the `mathworks/arrow-site` fork in order to trigger the `gh-pages` deployment workflow. I then selected `gh-pages` as the GitHub Pages deployment branch and verified that the site was deployed as expected to https://mathworks.github.io/arrow-site/. For an example of a successful workflow run, see: https://github.com/mathworks/arrow-site/actions/runs/4313253336/jobs/7524824999. 2. I inspected the GitHub Actions workflow steps to ensure there are no errors. # Future Directions 1. While qualifying with the [fork deployment workflow](https://github.com/apache/arrow-site#deployment), I realized that I needed to [manually change the GitHub Pages deployment branch](https://docs.github.com/en/pages/quickstart) from `asf-site` to `gh-pages` in the "Pages" settings of the `mathworks/arrow-site` fork. This wasn't immediately obvious, and it [isn't listed explicitly as a required step in the README.md](https://github.com/apache/arrow-site#deployment) of `apache/arrow-site`. It would helpful to add an explicit note about this step. I'll follow up with a pull request to add this. 2. As described in the "Workarounds" section of the description of apache/arrow-site#325, there is still more we could choose to do to address the root cause of these build failures (the deprecation of the `md4` hash algorithm in Node 18). This would include upgrading to the latest version of Webpack, setting the `output.hashFunction` to `xxhash64` for Webpack, and upgrading to the latest version of Node.js (i.e. version 19). 3. Since moving the workflow inside of a container requires downloading dependencies (e.g. `git`, `rsync`, `libyaml-0-2`, etc.) using `apt-get`, **this has added an additional 1 minute of running time to the GitHub Actions workflow**. It may be possible to mitigate this somewhat through the use of caching and/or a custom Dockerfile that has the required dependencies pre-installed. # Notes 1. The additional 1 minute of running time that is added by having to install dependencies using `apt-get` is somewhat unfortunate. That being said, it may be OK to proceed with this overhead for now to unblock the deployment workflow. For comparison, the [`arrow-ballista` workflow that @avantgardnerio shared](https://github.com/apache/arrow-ballista/blob/b61cfbf54705f4cbfcbc7103f87509e49cd01fda/.github/workflows/rust.yml#L79) as an example of running a workflow inside of a container doesn't utilize caching and takes approximately the same time to download required dependencies using `apt-get`. Of course, I am more than happy to investigate caching / alternative approaches in more depth if the community feels the additional time overhead is too much. 2. Thank you @sgilmore10 for your help with this pull request! 3. Thank you to @avantgardnerio for your suggestion to move the deployment workflow inside of an `ubuntu:latest` container! Closes apache/arrow-site#325. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
