[GitHub] [accumulo-website] ddanielr commented on a diff in pull request #384: Add containerized development environment

via GitHub Tue, 25 Apr 2023 21:42:59 -0700


ddanielr commented on code in PR #384:
URL: https://github.com/apache/accumulo-website/pull/384#discussion_r1177328415



##########
Dockerfile:
##########
@@ -0,0 +1,40 @@
+# This Dockerfile builds an ruby environment for jekyll that empowers
+# making updates to the accumulo website without requiring the dev
+# to maintain a local ruby development environment.
+
+FROM ruby:2.7.8-slim-bullseye as base
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    git \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+WORKDIR /site
+
+
+# Copy over the Gemfiles so that all build dependencies are installed
+# during build vs at runtime.
+
+COPY Gemfile /site/Gemfile
+COPY Gemfile.lock /site/Gemfile.lock

Review Comment:
   > Maybe "/mnt/repo" or "/mnt/staging" or "/mnt/build" or "/mnt/workdir"? I 
like putting things in `/mnt` because it makes things pretty clear that it's 
mounted, and from the perspective from inside the container, this is a pretty 
standard location for external filesystems to be mounted at.
   
   Out of those options I like `/mnt/workdir` the best. 
   
   > 
   > > So everything that happens in the Dockerfile is part of the container 
build stage. This is completely separate from actually running the container.
   > 
   > My question isn't really about WHEN the RUN commands execute (I get that 
it's during the build stage), but WHERE (because it doesn't make sense that 
they would run locally on the host). My understanding is that the build stage 
involves running a container from the base image, executing RUN commands to 
modify the image, and when that build container stops, the resulting image is 
the modified image that we use to launch our own containers.
   > 
   > Essentially:
   > 
   > ```
   > docker build -> create temporary build container from base image, start 
build container, exec RUN commands 
   > inside build container, stop build container, save modified image, delete 
build container
   > docker create -> create container from image
   > docker start -> run container with specified command, or stored CMD if 
command is unspecified
   > docker run -> shorthand for docker create + docker start
   >```
   
   RUN commands are always run in a docker image and are executed in the 
WORKDIR that has been specified in the Dockerfile. 
   https://docs.docker.com/engine/reference/builder/#run 
   
   For now, realize that you don't have a container "running" as part of the 
build process.
   Instead you have something similar to a git tree where each command in a 
dockerfile is executed by the docker deamon and results in a new commit layer 
being added (ENV, LABEL, and WORKDIR are special and don't create a new commit 
layer).  That new layer is then used by the next command as it's container 
context.
   
   https://docs.docker.com/build/building/packaging/#dockerfile 
   
   The final image layer generated is the last "layer" of changes that comprise 
the image and is tagged as the final image.
   
   So it's different from building a virtual machine image where you only get 
an single output of a image based on all the commands run within the image at 
build.
   
   This allows Docker to share image layers between containers and save space. 
   
   > Some rendering might use /tmp inside the container. I'm not sure if any of 
that is preserved inside the container, but I don't think we care if it is or 
isn't.
   
   The github action only referenced publishing the contents of `_site` so I 
don't think the possible use of `/tmp` matters since it was never published 
before.
   
https://github.com/apache/accumulo-website/blob/0d3904b0c3b4eb839bb59158946d9068fba7ef53/.github/workflows/jekyll.yaml#L51-L56
   
   
   > > Now, we want to mount the rendered content into the validator container. 
However, the webdev-validator container needs to have a slightly modified 
Gemfile.
   > > To accomplish this, the run command only mounts $PWD/_site to 
/site/_site. This sets up the following structure
   > 
   > My understanding here is that the build of the validator container does 
_not_ have the repo mounted. So, the Gemfiles it is modifying are the ones 
copied to the base image when that was built.
   
   Correct. 
   
   > 
   > The confusion here is that when we run a container from the base image, 
we're mounting over the same directory as we copied these files. Are these just 
masked/hidden by the mount, then? Usually it's a good idea to mount to an empty 
directory. But in this case, it seems we're not doing that in the base image.
   
   I agree that it's a good idea to mount to an empty directory for traditional 
systems. 
   
   However, mapping an existing directory overtop of a pre-defined one is a 
standard action with containers. 
   Docker just discards the contents of the container directory and references 
the host dir.
   
   > 
   > If we're mounting over the copied Gemfiles, then there's no reason to 
create a second validator image... we can just modify the Gemfile during the 
base build, install _all_ the gems to the image during the base image build. 
When we mount our website repo to render the Jekyll, we hide the modified 
Gemfiles. When we mount our rendered `_site` directory to do the validation, we 
re-expose the modified Gemfiles. We don't need a second image at all... we can 
do everything we want with the single image.
   > 
   > Actually, we don't even need to copy over the Gemfiles during the build 
stage... those aren't needed for the htmlproofer. htmlproofer has need of its 
own Gemfiles. We're only copying over the ones from the repo and modifying 
them... but we don't need to do that... we can just have a separate set for the 
htmlproofer... and we don't even need the lock one at all for that.
   > 
   > I think we can simplify things dramatically by just using one container.
   
   My original design had the single container but I had split it into two to 
restrict the Gemfile changes. 
   So if there's a better way to separate Gemfile changes I'm happy to try 
that. 
   I do agree that it massively simplifies things. 
   
   > 
   > I do have one more question about WORKDIR... it seems that's overloaded to 
specify the current working directory inside the build container as well as the 
runtime container. But, I don't see why that necessarily needs to be the case.
   
   I'm not exactly sure what question you're asking here, but I'll try and 
provide the information I know. 
    
   So WORKDIR sets the PWD for any following RUN, COPY, ENTRYPOINT, and CMD 
commands in the Dockerfile. 
   
   If you don't set WORKDIR explicitly then it will default to `/` if it was 
not previously defined by the image in your FROM command.
    
   Typically you would set the WORKDIR to ensure that all of your operations 
are running from an intended place, vs happening in unexpected locations within 
the container. 
   
   If you wanted your runtime container to run things in a different location 
than the build commands, you can simply add another WORKDIR line in your 
Dockerfile and any commands defined after it will use the new location (like 
CMD). 
   
   https://docs.docker.com/engine/reference/builder/#workdir 
   
   > 
   > Also, one thing I was thinking was that it can be really hard to verify 
links using the rendered HTML, because you don't have an absolute directory to 
resolve absolute paths. The generated site is intended to be served at the root 
of a webserver. I don't know if htmlproofer supports this, but it'd probably be 
better to run the validation against `http://localhost:4000/` rather than 
`file:///site/_site/`.
   > 
   htmlproofer seems explicitly designed to run against rendered HTML code in a 
directory. I didn't see anything in the documentation related to pointing it at 
a live site.
   The project does support [adjusting for a 
baseurl](https://github.com/gjtorikian/html-proofer#adjusting-for-a-baseurl)
   
   > If we want to get this in quickly, we could get the Dockerfile in without 
the validator stuff right away. I think that because the validator stuff adds 
so much complexity (extra steps, extra documentation, maybe extra containers) 
that still needs polishing, I want to go back to my much earlier opinion that 
the validation stuff should be added in a separate PR. At this point, I think 
it's likely to hold up this PR further.
   
   At this point I agree. 
   
   Figuring out how to support gem isolation is important but we can get the 
base image docker changes in and provide benefit for any developers generating 
documentation. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo-website] ddanielr commented on a diff in pull request #384: Add containerized development environment

Reply via email to