I'm using lxd under ubuntu 16.04 with zfs.
I want to use an existing container snapshot as a cloning base to create
other containers. As far as I can see this is done via "lxc publish",
although I can't find much documentation on this apart from this blog post:
https://insights.ubuntu.com/2015/06/30/publishing-lxd-images/
My question is around zfs disk space usage. I was hoping that the
publish operation would simply take a snapshot of the existing container
and therefore use no more local disk space, but in fact it seems to use
the whole amount of disk space again. Let me demonstrate. First the
clean system:
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 95.5K 77.0G - 0% 0% 1.00x ONLINE -
Now I create a container, then a couple more, from the same image:
root@vtp:~# lxc launch ubuntu:16.04 base1
Creating base1
Retrieving image: 100%
Starting base1
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 644M 76.4G - 0% 0% 1.00x ONLINE -
root@vtp:~# lxc launch ubuntu:16.04 base2
Creating base2
Starting base2
root@vtp:~# lxc launch ubuntu:16.04 base3
Creating base3
Starting base3
root@vtp:~# lxc exec base1 /bin/sh -- -c 'echo hello >/usr/test.txt'
root@vtp:~# lxc stop base1
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 655M 76.4G - 0% 0% 1.00x ONLINE -
So disk space usage is about 645MB for the image, and small change for
the instances launched from it. Now I want to clone further containers
from base1, so I publish it:
root@vtp:~# time lxc publish base1 --alias clonemaster
Container published with fingerprint:
80ec0105da9d1f8f173e45233921bc772319e39364c322786a5b4cfec895cb68
real 0m45.155s
user 0m0.000s
sys 0m0.012s
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.27G 75.7G - 0% 1% 1.00x ONLINE -
root@vtp:~# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
lxd/images/80ec0105da9d1f8f173e45233921bc772319e39364c322786a5b4cfec895cb68@readonly
0 - 638M -
lxd/images/f4c4c60a6b752a381288ae72a1689a9da00f8e03b732c8d1b8a8fcd1a8890800@readonly
0 - 638M -
I notice that (a) publish is a slow process, and (b) disk usage has
doubled. Finally launch a container:
root@vtp:~# lxc launch clonemaster myclone
Creating myclone
Starting myclone
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.27G 75.7G - 0% 1% 1.00x ONLINE -
That's fine - it's sharing with the image as expected.
Now, what I was hoping for was that the named image (clonemaster) would
be a snapshot derived directly from the parent, so that it would also
share disk space. What I'm actually trying to achieve is a workflow like
this:
- launch (say) 10 initial master containers
- customise those 10 containers in different ways (e.g. install
different software packages in each one)
- launch multiple instances from each of those master containers
This is for a training lab. The whole lot will then be packaged up and
distributed as a single VM. It would be hugely helpful if the initial
zfs usage came to around 650MB not 6.5GB.
The only documentation I can find about images is here:
https://github.com/lxc/lxd/blob/master/doc/image-handling.md
It talks about the tarball image format: is it perhaps the case that
"lxc publish" is creating a tarball, and then untarring it into a fresh
snapshot? Is that tarball actually stored anywhere? If so, I can't find
it. Or is the tarball created dynamically when you do "lxc image copy"
to a remote? If so, why not just use a zfs snapshot for "lxc publish"?
<<Digs around>> Maybe it's done this way because "A dataset cannot be
destroyed if snapshots of the dataset exist
<http://docs.oracle.com/cd/E19253-01/819-5461/6n7ht6r4f/>": i.e. using a
snapshot for publish would prevent the original container being deleted.
That makes sense - although I suppose it could have its contents rm
-rf'd and then renamed
<http://docs.oracle.com/cd/E19253-01/819-5461/gamnn/index.html> to a
graveyard name.
The other option I can think of is zfs dedupe. The finished target
system won't have the resources to do dedupe continuously. However I
could turn on dedupe during the cloning, do the cloning, and then turn
it back off again (*)
Have I understood this correctly? Any additional clues gratefully received.
Thanks,
Brian Candler.
(*) P.S. I did a quick test of this. It looks like this doesn't work to
deduplicate against any pre-existing files:
root@vtp:~# zfs set dedup=on lxd
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.27G 75.7G - 0% 1% 1.00x ONLINE -
root@vtp:~# lxc exec base2 /bin/sh -- -c 'echo world >/usr/test.txt'
root@vtp:~# lxc stop base2
root@vtp:~# lxc publish base2 --alias clonemaster2
Container published with fingerprint:
8a288bd1364d82d4d8afb23aee67fa13586699c539fad94e7946f60372767150
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.88G 75.1G - 1% 2% 1.05x ONLINE -
But then I rebooted, and published another image:
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.87G 75.1G - 1% 2% 1.05x ONLINE -
root@vtp:~# lxc exec base3 /bin/sh -- -c 'echo world2 >/usr/test.txt'
root@vtp:~# lxc stop base3
root@vtp:~# time lxc publish base3 --alias clonemaster3
Container published with fingerprint:
6abbeb5df75989944a533fdbb1d8ab94be4d18cccf20b320c009dd8aef4fb65b
real 0m55.338s
user 0m0.008s
sys 0m0.008s
root@vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.88G 75.1G - 1% 2% 2.11x ONLINE -
So I suspect it would have all worked if I'd turned on dedupe before the
very first image was fetched.
_______________________________________________
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users