jorgecarleitao commented on a change in pull request #8824:
URL: https://github.com/apache/arrow/pull/8824#discussion_r538041597



##########
File path: .github/workflows/cpp.yml
##########
@@ -21,7 +21,7 @@ on:
   push:
     paths:
       - '.github/workflows/cpp.yml'
-      - 'ci/docker/**'
+      - 'ci/docker/*cpp*'

Review comment:
       Sorry it took me so long to go through this comment. I actually think 
that this PR is correct, but it surfaces a bigger problem present in master.
   
   Let's say that we have an image X (typically build via `archery build X`), 
that depends on a `base` image (typically build via `archery build base`).
   
   Let's analyze what happens in the following 4 cases:
   
   * push to PR with no change to base image's dockerfile
   * push to master with no change to base image's dockerfile
   * push to PR with a change to base image's dockerfile
   * push to master with a change to base image's dockerfile
   
   Let's also call the hash of the current base image in the registry as 
`base_v1`.
   
   ## push to PR with no change to base image's dockerfile
   
   When a PR is open, image X is built using the dockerfile on the PR for that 
image, and it uses `base_v1` from the registry to base its build from.
   
   ## push to master with no change to base image's dockerfile
   
   Same as before.
   
   ## push to PR with a change to base image's dockerfile
   
   In this situation, the PR triggers two CI jobs: the build of image `X` and 
the build image `base`. Since these two jobs are independent workflows, image 
`X` will pick `base_v1`, while the base image job will build `base_v2`.
   
   This happens because `docker-compose run X` does not know that `X` depends 
on the compose job `base` to create the image `base`, and instead has that 
image available from the remote docker registry `${REPO}`.
   
   Because PRs do not push images to the registry, a new push to the PR will 
not alter this behavior: the image `X` will continue to be built based on 
`base_v1`. Thus, any error derived from building image `X` from the new image 
`base_v2` will not surface.
   
   ## push to master with a change to base image's dockerfile (e.g. PR is 
merged)
   
   In this situation, there is a race condition (undefined behavior): image X 
will be built using `base_v1` if it starts running before the flow `build image 
base` finishes (the likely scenario), or it picks `base_v2` if the other flow 
has finished.
   
   Thus, atm, whenever a PR changes both dockerfiles, `X` and `base`, the image 
X on the registry is based of `base_v1`, but `base:latest` points to `base_v2`.
   
   Thus, the pipeline will actually pass in master, even if the build of `X` 
using `base_v2` is incorrect(!). On the _next_ push to master that triggers `X` 
to be built, the error surfaces 🤯
   
   ## Summary
   
   In both cases where the dockerfile of `base` changes, that change is not 
propagated to the build of `X`. Therefore, IMO this PR is correct: changes to 
the `dockerfile` of image `base` do not require to trigger a change to the 
build of the image `X`, because there are no constraints in the jobs to build 
`X` to wait for the build of image `base`.
   
   This issue seems to indicate that our builds based of docker images that 
depend on other docker images that are build using `docker-compose build` may 
be too complex as there is hidden state through the registry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to