Re: Review Request 25237: Avoid Docker pull on each run

Timothy Chen Mon, 01 Sep 2014 21:07:11 -0700


> On Sept. 1, 2014, 9:57 p.m., Tom Arnfeld wrote:
> > Couple of points, please correct me if I've misunderstood anything :)
> > 
> > Can you not just do a `docker run .. {image} ..` and let docker take care 
> > of pulling the image if needed? By default, docker will pull the image if 
> > one with the same registry/repo/tag combo doesn't exist.
> > 
> > The assumption here is that an image (comprised of {registry + repository + 
> > tag}) is never going to change. For example, the default tag used by docker 
> > is `latest`, which suggests to me that you can push new versions of your 
> > image to a registry, and update the `latest` tag to point to the new image. 
> > After this change in mesos, I would need to log in to every mesos slave 
> > that had ever downloaded this image, and run a `docker pull`.
> > 
> > The alternative is of course to use new tags for every new image (e.g. git 
> > hashes). Though this means I need to update every framework that has been 
> > configured with docker image names and change them to the new tag. I can 
> > see the appeal of this approach when thinking soley about service 
> > schedulers, because it could be problematic to control a rolling release if 
> > any new task will automatically run the new image (as it takes the latest 
> > image from the registry).
> > 
> > I've actually raised this issue several times with various people in the 
> > docker community and never managed to get a concrete answer other than just 
> > run `docker pull` every time (which is what we've been doing outside of 
> > mesos). I think the difference between these use cases needs to be given 
> > some serious thought, as it's caused us pain in various ways, hence why we 
> > ended up running `docker pull` before every task to avoid the problem.
> > 
> > A working example would be the redis repository 
> > (https://registry.hub.docker.com/v1/repositories/redis/tags), you'll see 
> > that the `latest` tag is pointing at version 2.8. This tag is updated every 
> > time a new image is published, and if I were to use the `latest` tag (or 
> > not specify a tag, since it's the default) I would need to either explicity 
> > change my deployment of redis to use a strict version, or manually `docker 
> > pull` on all slaves and restart all the tasks using this container image.
> > 
> > It's also important to take into consideration long running frameworks like 
> > Hadoop on Mesos, if this change were to be merged, and to avoid logging 
> > into every slave and running `docker pull` we would need to restart the 
> > JobTracker and change the image to a newer (never previously used) tag. As 
> > opposed to new TaskTrackers automatically being launched inside the new 
> > image.
> > 
> > I guess a fair amount of this depends on what you're expecting to get from 
> > using Docker. Software deployment or just dependency management and 
> > isolation?
> > 
> > I'm not against running `docker inspect && docker pull` on every slave in 
> > the cluster, but I'd like the requirement to do that to be chosen. Perhaps 
> > you guys have already had this discussion... I'm very interested to see 
> > what others have been doing to solve this problem.
> 
> Timothy Chen wrote:
>     Hi Tom, there are definitely lots of trade off questions and honestly I 
> don't think there are obvious choices.
>     We could allow docker pull on each run which we originally did, but hits 
> several problems like relying on registry server to be up at all times which 
> proves to be not the case. It also has limitations of the scalability of 
> registry server, as well as no longer allowing anyone to run local images.
>     
>     However, without a pull you don't necessarily get the very latest tag if 
> you simply specify no tag.
>     
>     Currently Docker run's semantics as you mentioned, doesn't auto pull if 
> it already exists locally and I'm simply matching that for now. If users 
> really want to gurantee what image you're running I think specifying the 
> exact tag for your image is the best way to go, and not relying on latest as 
> that's not reliable since even Docker run doesn't do it.
>     
>     It's sure can be optional, but so far from all the use cases I've heard 
> no one has required a docker pull on each run and most people are suprised on 
> why we pull each time. I'm trying not to expose too much knobs that are not 
> necessary. 
>     
>     And answering your docker run {image} point, we intentionally seperate 
> the docker image pulling and running into two phases as we like to know what 
> exact phase the docker process is doing, and also it's easier to reason with 
> when integrated into Mesos as we need to handle a container being destroyed 
> in any point of time.
> 
> Tom Arnfeld wrote:
>     All very fair points, thanks for the clarification.


Thanks for the comments! I think it's valuable to keep discussing these, and 
options are still wide open as well, nothing is set in stone :)


- Timothy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25237/#review52004
-----------------------------------------------------------


On Sept. 1, 2014, 7:16 p.m., Timothy Chen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25237/
> -----------------------------------------------------------
> 
> (Updated Sept. 1, 2014, 7:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Jie Yu.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Avoid Docker pull on each run.
> 
> Currently each Docker run will run a docker pull which calls the docker 
> registry each time.
> To avoid this this patch adds a docker inspect <image> and skip calling pull 
> if it already exists.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/docker.cpp 0febbac5df4126f6c8d9a06dd0ba1668d041b34a 
> 
> Diff: https://reviews.apache.org/r/25237/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Timothy Chen
> 
>

Re: Review Request 25237: Avoid Docker pull on each run

Reply via email to