ddanielr commented on code in PR #384: URL: https://github.com/apache/accumulo-website/pull/384#discussion_r1177328415
########## Dockerfile: ########## @@ -0,0 +1,40 @@ +# This Dockerfile builds an ruby environment for jekyll that empowers +# making updates to the accumulo website without requiring the dev +# to maintain a local ruby development environment. + +FROM ruby:2.7.8-slim-bullseye as base + +RUN apt-get update && apt-get install -y --no-install-recommends \ + build-essential \ + git \ + curl \ + && rm -rf /var/lib/apt/lists/* + +WORKDIR /site + + +# Copy over the Gemfiles so that all build dependencies are installed +# during build vs at runtime. + +COPY Gemfile /site/Gemfile +COPY Gemfile.lock /site/Gemfile.lock Review Comment: > Maybe "/mnt/repo" or "/mnt/staging" or "/mnt/build" or "/mnt/workdir"? I like putting things in `/mnt` because it makes things pretty clear that it's mounted, and from the perspective from inside the container, this is a pretty standard location for external filesystems to be mounted at. Out of those options I like `/mnt/workdir` the best. > > > So everything that happens in the Dockerfile is part of the container build stage. This is completely separate from actually running the container. > > My question isn't really about WHEN the RUN commands execute (I get that it's during the build stage), but WHERE (because it doesn't make sense that they would run locally on the host). My understanding is that the build stage involves running a container from the base image, executing RUN commands to modify the image, and when that build container stops, the resulting image is the modified image that we use to launch our own containers. > > Essentially: > > ``` > docker build -> create temporary build container from base image, start build container, exec RUN commands > inside build container, stop build container, save modified image, delete build container > docker create -> create container from image > docker start -> run container with specified command, or stored CMD if command is unspecified > docker run -> shorthand for docker create + docker start >``` RUN commands are always run in a docker image and are executed in the WORKDIR that has been specified in the Dockerfile. https://docs.docker.com/engine/reference/builder/#run For now, realize that you don't have a container "running" as part of the build process. Instead you have something similar to a git tree where each command in a dockerfile is executed by the docker deamon and results in a new commit layer being added (ENV, LABEL, and WORKDIR are special and don't create a new commit layer). That new layer is then used by the next command as it's container context. https://docs.docker.com/build/building/packaging/#dockerfile The final image layer generated is the last "layer" of changes that comprise the image and is tagged as the final image. So it's different from building a virtual machine image where you only get an single output of a image based on all the commands run within the image at build. This allows Docker to share image layers between containers and save space. > Some rendering might use /tmp inside the container. I'm not sure if any of that is preserved inside the container, but I don't think we care if it is or isn't. The github action only referenced publishing the contents of `_site` so I don't think the possible use of `/tmp` matters since it was never published before. https://github.com/apache/accumulo-website/blob/0d3904b0c3b4eb839bb59158946d9068fba7ef53/.github/workflows/jekyll.yaml#L51-L56 > > Now, we want to mount the rendered content into the validator container. However, the webdev-validator container needs to have a slightly modified Gemfile. > > To accomplish this, the run command only mounts $PWD/_site to /site/_site. This sets up the following structure > > My understanding here is that the build of the validator container does _not_ have the repo mounted. So, the Gemfiles it is modifying are the ones copied to the base image when that was built. Correct. > > The confusion here is that when we run a container from the base image, we're mounting over the same directory as we copied these files. Are these just masked/hidden by the mount, then? Usually it's a good idea to mount to an empty directory. But in this case, it seems we're not doing that in the base image. I agree that it's a good idea to mount to an empty directory for traditional systems. However, mapping an existing directory overtop of a pre-defined one is a standard action with containers. Docker just discards the contents of the container directory and references the host dir. > > If we're mounting over the copied Gemfiles, then there's no reason to create a second validator image... we can just modify the Gemfile during the base build, install _all_ the gems to the image during the base image build. When we mount our website repo to render the Jekyll, we hide the modified Gemfiles. When we mount our rendered `_site` directory to do the validation, we re-expose the modified Gemfiles. We don't need a second image at all... we can do everything we want with the single image. > > Actually, we don't even need to copy over the Gemfiles during the build stage... those aren't needed for the htmlproofer. htmlproofer has need of its own Gemfiles. We're only copying over the ones from the repo and modifying them... but we don't need to do that... we can just have a separate set for the htmlproofer... and we don't even need the lock one at all for that. > > I think we can simplify things dramatically by just using one container. My original design had the single container but I had split it into two to restrict the Gemfile changes. So if there's a better way to separate Gemfile changes I'm happy to try that. I do agree that it massively simplifies things. > > I do have one more question about WORKDIR... it seems that's overloaded to specify the current working directory inside the build container as well as the runtime container. But, I don't see why that necessarily needs to be the case. I'm not exactly sure what question you're asking here, but I'll try and provide the information I know. So WORKDIR sets the PWD for any following RUN, COPY, ENTRYPOINT, and CMD commands in the Dockerfile. If you don't set WORKDIR explicitly then it will default to `/` if it was not previously defined by the image in your FROM command. Typically you would set the WORKDIR to ensure that all of your operations are running from an intended place, vs happening in unexpected locations within the container. If you wanted your runtime container to run things in a different location than the build commands, you can simply add another WORKDIR line in your Dockerfile and any commands defined after it will use the new location (like CMD). https://docs.docker.com/engine/reference/builder/#workdir > > Also, one thing I was thinking was that it can be really hard to verify links using the rendered HTML, because you don't have an absolute directory to resolve absolute paths. The generated site is intended to be served at the root of a webserver. I don't know if htmlproofer supports this, but it'd probably be better to run the validation against `http://localhost:4000/` rather than `file:///site/_site/`. > htmlproofer seems explicitly designed to run against rendered HTML code in a directory. I didn't see anything in the documentation related to pointing it at a live site. The project does support [adjusting for a baseurl](https://github.com/gjtorikian/html-proofer#adjusting-for-a-baseurl) > If we want to get this in quickly, we could get the Dockerfile in without the validator stuff right away. I think that because the validator stuff adds so much complexity (extra steps, extra documentation, maybe extra containers) that still needs polishing, I want to go back to my much earlier opinion that the validation stuff should be added in a separate PR. At this point, I think it's likely to hold up this PR further. At this point I agree. Figuring out how to support gem isolation is important but we can get the base image docker changes in and provide benefit for any developers generating documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
