Aaron Fabbri created HADOOP-19896:
-------------------------------------

             Summary: ci: improve performance of toolchain image building
                 Key: HADOOP-19896
                 URL: https://issues.apache.org/jira/browse/HADOOP-19896
             Project: Hadoop Common
          Issue Type: Sub-task
            Reporter: Aaron Fabbri


Our hadoop CI first builds a "toolchain" container image, based off of 
`dev_support/docker` files. In then uses this container to build our code, and 
also to run some tests.

Our initial github workflows (inspired by Apache Spark) always build a 
container image on each workflow trigger, relying on container build caching to 
reduce runtime.

I propose we try a more efficient approach, and see whether or not it is an 
improvement:

1. Build and publish toolchain images whenever we push to trunk, and those 
changes affect any definitions that influence the toolchain image (i.e. we need 
to refresh).
2. (extra credit) Also trigger this toolchain build workflow on a schedule–just 
to ensure that we are still updating the toolchain images if for some reason no 
changes are made to trunk for some "max age" time (e.g. weekly).
3. PRs that don't make toolchain changes just download the latest trunk 
toolchain build. PRs that do make toolchain changes can A. fall back to current 
behavior automatically, or B. change a workflow variable to force this 
behavior. (A seems preferrable).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to