Hi Laura,
> I am preparing a talk to introduce the community in a meetup, and one of > the topics the people want to know about is its relationship with > Docker. There is no direct relationship between the two. They serve different purposes and there’s little overlap, but there’s confusion about what Docker can do ad that has often lead to people assuming that it does way more for reproducibility than it actually does. Docker and systems like it are a combination of a “disk image” (i.e. an archive containing files and directories) and a tool to bind together Linux kernel features that can be used for virtualizing parts of the system. There are different levels of virtualization. One can emulate a CPU and all hardware in software and run a full operating system on that fully virtualized computer. Or one can use special hardware support to avoid having to implement the CPU in software. Similar features exist to avoid having to implement other hardware in software (“paravirtualization”). But traditionally with Linux that’s as far as things would go: you’d still virtualize the whole computer, the whole disk, and run a full operating system on it. With the Hurd, on the other hand, virtualization was always fine grain. You can, for example, redefine the directory tree as seen by a single process (i.e. file system virtualization). Or you could virtualize the network interface and share everything else. With Linux full-machine virtualization turned out to be a little too much for many applications. So eventually Linux gained features to virtualize the process namespace (a process can think it runs alone on the machine), the user account namespace (a process can think it runs as root), cgroups to virtualize e.g. memory (a process can use *all* the available memory, but the system pretends only a fraction of the memory is “all the available memory”), etc. With bind mounts and chroot (and others) Linux also has a way to virtualize the file system (and e.g. pretend that a certain directory is the root directory). Docker coordinates all of these features and thus enables fine grain virtualization for individual applications: applications can be delivered as big disk archives that are unpacked, mounted, and then declared to be the root file system from the point of view of the application. This makes application deployment really easy: just take a big disk image that contains the application and *all* its dependencies, and share the host system’s kernel, CPU, memory, and other file systems. Some people thought that this ease of deployment also translates to increased reproducibility: given the same disk image one could run an application off that image without having any of the host system’s libraries affect its operation. This is hardly different from old-school full system virtualization – it’s just more convenient and a tad lighter. Docker does not care about reproducibility of the disk images it uses. Generating these images at different points in time usually yields different disk images. The images can also contain way more stuff than needed. There is no direct correspondence between the contents of the disk image and the commands in a Dockerfile that may be used to generate the disk image. This can even be a security problem as the disk archive cannot be verified and hence cannot be trusted. Guix is not a container system, but when used on Linux it does support the same virtualization features (they are also exposed by “guix environment”, for example). Guix is concerned with providing a path from the declaration of a full environment to a reproducible binary. Guix can thus be used to generate Docker disk images in a reproducible fashion. Guix does without disk images (it just installs things in finer grain resolution to individual directories under /gnu/store) but still achieves isolation and separation of packages. Together with Linux container/virtualization features it can be used for pretty much the same things that Docker is often used for. -- Ricardo
