kevingurney opened a new pull request, #326:
URL: https://github.com/apache/arrow-site/pull/326

   # Overview
   
   This pull request modifies the `apache/arrow-site` website deployment 
workflow (`.github/workflows/deploy.yml`) to run inside of an `ubnutu:latest` 
container in order to resolve the build issue described in #325.
   
   Running inside of a container should help to avoid unexpected breaking 
changes to dependencies that can occur when depending on the proprietary GitHub 
runner image `ubuntu-latest`. In addition, using a container as the workflow 
environment means that developers can theoretically more easily reproduce the 
CI behavior locally by running their own containers in their development 
environment.
   
   # Qualification
   
   To qualify these changes, I:
   
   1. Submitted these changes to the `main` branch of the 
`mathworks/arrow-site` fork in order to trigger the `gh-pages` deployment 
workflow. I then selected `gh-pages` as the GitHub Pages deployment branch and 
verified that the site was deployed as expected to 
https://mathworks.github.io/arrow-site/. For an example of a successful 
workflow run, see: 
https://github.com/mathworks/arrow-site/actions/runs/4313253336/jobs/7524824999.
   2. I inspected the GitHub Actions workflow steps to ensure there are no 
errors.
    
   # Future Directions
   
   1. While qualifying with the [fork deployment 
workflow](https://github.com/apache/arrow-site#deployment), I realized that I 
needed to [manually change the GitHub Pages deployment 
branch](https://docs.github.com/en/pages/quickstart) from `asf-site` to 
`gh-pages` in the "Pages" settings of the `mathworks/arrow-site` fork. This 
wasn't immediately obvious, and it [isn't listed explicitly as a required step 
in the README.md](https://github.com/apache/arrow-site#deployment) of 
`apache/arrow-site`. It would helpful to add an explicit note about this step. 
I'll follow up with a pull request to add this.
   2. As described in the "Workarounds" section of the description of 
apache/arrow-site#325, there is still more we could choose to do to address the 
root cause of these build failures (the deprecation of the `md4` hash algorithm 
in Node 18). This would include upgrading to the latest version of Webpack, 
setting the `output.hashFunction` to `xxhash64` for Webpack, and upgrading to 
the latest version of Node.js (i.e. version 19).
   3. Since moving the workflow inside of a container requires downloading 
dependencies (e.g. `git`, `rsync`, `libyaml-0-2`, etc.) using `apt-get`, **this 
has added an additional 1 minute of running time to the GitHub Actions 
workflow**. It may be possible to mitigate this somewhat through the use of 
caching and/or a custom Dockerfile that has the required dependencies 
pre-installed.
   
   # Notes
   
   1. The additional 1 minute of running time that is added by having to 
install dependencies using `apt-get` is somewhat unfortunate. That being said, 
it may be OK to proceed with this overhead for now to unblock the deployment 
workflow. For comparison, the [`arrow-ballista` workflow that @avantgardnerio 
shared](https://github.com/apache/arrow-ballista/blob/b61cfbf54705f4cbfcbc7103f87509e49cd01fda/.github/workflows/rust.yml#L79)
 as an example of running a workflow inside of a container doesn't utilize 
caching and takes approximately the same time to download required dependencies 
using `apt-get`. Of course, I am more than happy to investigate caching / 
alternative approaches in more depth if the community feels the additional time 
overhead is too much.
   2. Thank you @sgilmore10 for your help with this pull request!
   3. Thank you to @avantgardnerio for your suggestion to move the deployment 
workflow inside of an `ubuntu:latest` container!
   
   Closes apache/arrow-site#325.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to