On Mon, May 25, 2020 at 02:21:48AM +0200, Thomas Goirand wrote: > On 5/24/20 11:39 PM, Bastian Blank wrote: > > On Sun, May 24, 2020 at 11:26:40PM +0200, Thomas Goirand wrote: > >> So I was wondering if we could: > >> 1/ Make the resulting extracted disk smaller. That'd be done in FAI, and > >> I have no idea how that would be done. Thomas, can you help, at least > >> giving some pointers on how we could fix this? > > > > Fix what? > > The fact that the raw image is 2GB once extracted, when it could be > 1/4th of that.
I don't think it's obvious how to do better. The only ways I know to make a raw image smaller than its fs are: 1) sparse files 2) compression FAI is using #1, and you want to avoid #2. Do you know another way? > >> 2/ Published the raw disk directly without compression (together with > >> its compressed form), so one can just point to it with Glance for > >> downloading. BTW, I don't see the point of having a tarball around the > >> compressed form, raw.xz is really enough, and would be nicer because > >> then one can pipe the output of xz directly to the OpenStack client (I > >> haven't checked, but I think that's maybe possible). > > > > No. Nothing in the download chain supports sparse files, so unwrapped > > raw images are somewhat out of the question. > > I've done this for 3 Debian releases [2], I don't see why we would loose > the feature because of a "sparse files" thing which you somehow find > important. I think Bastian's point is that tar is required to enable downloading the sparse files, since http can't represent the holes. Otherwise, you need to transfer the full size of the fs. I checked one of the older OpenStack images you linked to. It behaves just like the FAI raw images, as far as I can tell: ross@vanvanmojo:~/tmp$ curl -L -o disk.raw https://cdimage.debian.org/cdimage/openstack/archive/8.0.0/debian-8.0.0-openstack-amd64.raw % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 361 100 361 0 0 512 0 --:--:-- --:--:-- --:--:-- 512 100 2048M 100 2048M 0 0 11.0M 0 0:03:06 0:03:06 --:--:-- 10.4M ross@vanvanmojo:~/tmp$ ls -lh disk.raw -rw-r--r-- 1 ross ross 2.0G May 25 07:57 disk.raw ross@vanvanmojo:~/tmp$ du -h disk.raw 2.1G disk.raw Did I miss something? > So what you're talking about is just having a sparse *temporary* file, > before the upload to Glance. Do we care, when what I'm proposing is to > get rid about this extra step of downloading, before uploading to Glance? Is avoiding the extra download step more important than reducing the image size? Your first mail raised both issues, and FWIW, I thought you were mostly concerned about the size. To avoid the extra download for Glance, maybe it makes sense to use the upload stage of the pipeline. We could treat the generation of the preferred format for OpenStack like we treat the EC2 registration step, for example. > >> Another thing which bothers me, is that in our current publication, > >> there's no way to tell what image is from which point release. > > > > What is the significance of that? We use stuff from security primarily, > > so the point release don't show what might be in the image. > > Of course the point releases show what will be in the image. For > example, if a cloud user spawn a new instance using an image which is > from the latest point release, he knows a bunch of (non-security fixed) > packages wont need upgrades (for example, at least base-files, but often > many other as well, like for example tz-data). As a cloud user, I never want to care about point releases. There's usually a way to identify the latest image of a given release. For example, on AWS and GCP, the api can search for the latest debian 10 image. Many deployment tools integrate this functionality, so I can always deploy the latest debian 10 image. I've never used OpenStack though, so I don't know if it has similar features. > Someone may also want to run the image matching a given point release, > together with snapshot.debian.org (for example, just to test upgrades, > and many other possible scenarios). This is a valid use-case, but I don't think we should optimize for it. By integrating the point release into the version component, a user would need to know which point release they want. Currently, using a debian 10 image gets you the latest point release. Instead, you'd need to know that e.g. 10.2 was out, and was the latest. I think that's a bad user experience - most users that I work with know nothing about Debian's release processes. They'd be confused and frustrauted if they needed to know the point release. Heck, I don't know what point relase of buster we're on. > So yes, point release numbers do have significance. Images with a date > that first appears as random, and reveal itself only if carefully > matched to the point release dates aren't user friendly at all. > > If I say: Bastian, can you please give me the image from Buster 10.2, it > will for sure take you a lot of time to find it out. However, look at > this archive, which has security updates since 8.6.3: At the last sprint, we discussed building images more frequently to integrate security updates. Most in the group thought the complexity of lots of images outweighed the small benefit of avoiding the security downloads. > By the way, why are we keeping a history of 233 daily Bullseye images? > [1] Is this of any use to anyone? The CD team builds images weekly, > why do we need daily images published at the cloud team? And keep them > forever, when the CD team does not? At the last sprint we discussed the stable daily builds, and agreed that it's not worth keeping them (since they mostly end up being identical). Probably no one has had time to do anything about it. Testing & unstable aren't so clear - the notes in [1] indicate that we had more questions than answers. Have we hit a point where the cost in disk space is greater than the cost in effort to answer these questions and fix? Ross [1] - https://gobby.debian.org/export/Sprints/CloudSprint2019/2-%20Building%20images
