Re: Review Request 25237: Avoid Docker pull on each run

Timothy Chen Mon, 01 Sep 2014 15:23:28 -0700


> On Sept. 1, 2014, 9:57 p.m., Tom Arnfeld wrote:
> > Couple of points, please correct me if I've misunderstood anything :)
> > 
> > Can you not just do a `docker run .. {image} ..` and let docker take care 
> > of pulling the image if needed? By default, docker will pull the image if 
> > one with the same registry/repo/tag combo doesn't exist.
> > 
> > The assumption here is that an image (comprised of {registry + repository + 
> > tag}) is never going to change. For example, the default tag used by docker 
> > is `latest`, which suggests to me that you can push new versions of your 
> > image to a registry, and update the `latest` tag to point to the new image. 
> > After this change in mesos, I would need to log in to every mesos slave 
> > that had ever downloaded this image, and run a `docker pull`.
> > 
> > The alternative is of course to use new tags for every new image (e.g. git 
> > hashes). Though this means I need to update every framework that has been 
> > configured with docker image names and change them to the new tag. I can 
> > see the appeal of this approach when thinking soley about service 
> > schedulers, because it could be problematic to control a rolling release if 
> > any new task will automatically run the new image (as it takes the latest 
> > image from the registry).
> > 
> > I've actually raised this issue several times with various people in the 
> > docker community and never managed to get a concrete answer other than just 
> > run `docker pull` every time (which is what we've been doing outside of 
> > mesos). I think the difference between these use cases needs to be given 
> > some serious thought, as it's caused us pain in various ways, hence why we 
> > ended up running `docker pull` before every task to avoid the problem.
> > 
> > A working example would be the redis repository 
> > (https://registry.hub.docker.com/v1/repositories/redis/tags), you'll see 
> > that the `latest` tag is pointing at version 2.8. This tag is updated every 
> > time a new image is published, and if I were to use the `latest` tag (or 
> > not specify a tag, since it's the default) I would need to either explicity 
> > change my deployment of redis to use a strict version, or manually `docker 
> > pull` on all slaves and restart all the tasks using this container image.
> > 
> > It's also important to take into consideration long running frameworks like 
> > Hadoop on Mesos, if this change were to be merged, and to avoid logging 
> > into every slave and running `docker pull` we would need to restart the 
> > JobTracker and change the image to a newer (never previously used) tag. As 
> > opposed to new TaskTrackers automatically being launched inside the new 
> > image.
> > 
> > I guess a fair amount of this depends on what you're expecting to get from 
> > using Docker. Software deployment or just dependency management and 
> > isolation?
> > 
> > I'm not against running `docker inspect && docker pull` on every slave in 
> > the cluster, but I'd like the requirement to do that to be chosen. Perhaps 
> > you guys have already had this discussion... I'm very interested to see 
> > what others have been doing to solve this problem.

Hi Tom, there are definitely lots of trade off questions and honestly I don't 
think there are obvious choices.
We could allow docker pull on each run which we originally did, but hits 
several problems like relying on registry server to be up at all times which 
proves to be not the case. It also has limitations of the scalability of 
registry server, as well as no longer allowing anyone to run local images.

However, without a pull you don't necessarily get the very latest tag if you 
simply specify no tag.

Currently Docker run's semantics as you mentioned, doesn't auto pull if it 
already exists locally and I'm simply matching that for now. If users really 
want to gurantee what image you're running I think specifying the exact tag for 
your image is the best way to go, and not relying on latest as that's not 
reliable since even Docker run doesn't do it.

It's sure can be optional, but so far from all the use cases I've heard no one 
has required a docker pull on each run and most people are suprised on why we 
pull each time. I'm trying not to expose too much knobs that are not necessary. 

And answering your docker run {image} point, we intentionally seperate the 
docker image pulling and running into two phases as we like to know what exact 
phase the docker process is doing, and also it's easier to reason with when 
integrated into Mesos as we need to handle a container being destroyed in any 
point of time.

- Timothy

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25237/#review52004
-----------------------------------------------------------

On Sept. 1, 2014, 7:16 p.m., Timothy Chen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25237/
> -----------------------------------------------------------
> 
> (Updated Sept. 1, 2014, 7:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Jie Yu.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Avoid Docker pull on each run.
> 
> Currently each Docker run will run a docker pull which calls the docker 
> registry each time.
> To avoid this this patch adds a docker inspect <image> and skip calling pull 
> if it already exists.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/docker.cpp 0febbac5df4126f6c8d9a06dd0ba1668d041b34a 
> 
> Diff: https://reviews.apache.org/r/25237/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Timothy Chen
> 
>

Re: Review Request 25237: Avoid Docker pull on each run

Reply via email to