On lör, 2014-10-11 at 23:35 +0200, Lennart Poettering wrote:
> On Fri, 10.10.14 13:52, Alexander Larsson ([email protected]) wrote:
> 
> > So, I've got some kind of initial runtime going, and its now time to
> > look at how we want to package these runtimes/apps. There are a few
> > requirements, and a bunch of nice to have.
> > 
> > This is what we absolutely require:
> > 
> > * Some kind of format for an application that is delivered over the
> >   network. This will contain metadata + content (a set of files).
> > 
> > * A format for the application when installed on a system. This has to
> >   be done in such a way that we can access content via the normal
> >   kernel fs syscalls.
> 
> I am pretty sure these two formats need to be very close to each
> other, otherwise all the stuff like signatures that checked on access
> area really hard to do.

I agree, the network transport format pretty much follows from the
decision on how the installed form looks.

> > * Don't pass untrusted data to the kernel. For instance, it is risky
> >   to download raw filesystem data and then mount that, or mount a
> >   loopback file that the user can modify. The raw filesystem data is
> >   directly parsed by the kernel and weird data there can cause kernel
> >   panics.
> 
> Well, this is unavoidable if we ever want to allow fully signed
> systems. I mean, again, I would not isolate the problem of app images
> so much from the problem of OS images. I want to solve this at the
> same time, as the problems with verification, distribution and so on
> are pretty much the same. 

In some sense it is unavoidable. We have to tie the exact file data to
the signature. However, does this mean we have to shove random bits at
the kernel rather than going through the syscall interface?

btrfs-receive is a userspace tool that uses the regular userspace i/o
syscalls to do its modifications. How does this propose to handle the
signatures? If it can do it, why would it not be possible to do
ourselves?

> I also really don't believe that the kernel would be any worse with
> verifying structural integrity of images than userspace code...

I don't think that is a proper comparison. The "verifying" that the
userspace install code does is run in an unprivileged mode that then
feeds the resulting data via the well-tested syscall interface to the
kernel. However, the parsing of the on-disk filesystem structures is
done in a very highly privileged mode in the kernel.

That said, btrfs-recieve is a userspace tool, so it doesn't quite fit
what i talk about above with mounting pre-created filesystem images.

> > * Regular directory
> > 
> >   We require an install phase that explodes the app bundle into
> >   separate files.
> > 
> >   For multi-version storage we can use hardlinks which results in
> >   sharing both disk and page cache between versions at a file-granular
> >   level.
> > 
> >   Install and mounting is doable as non-root, doesn't pass untrusted
> >   data to the kernel and once done allows easy access to exported files.
> > 
> >   However, installation is not atomic, and there are no lazy checking
> >   of checksums or signatures.
> 
> Also, the hardlink farms are certainly not pretty.

They are not pretty, sure. However they are very widely available, and
the *only* solution that allows page-cache sharing between images, and
"trivial" deduplication between unrelated images. I don't think we
should to easily dismiss it.

> > * btrfs volumes
> > 
> >   If the filesystem where we're installing the app is btrfs (either natively
> >   or via a loopback mounted file) we can install the apps in subvolumes.
> >   If the root is btrfs this is easy, but the loopback mounted case is pretty
> >   tricky, as it requires resizing the loopback when needed, etc.
> > 
> >   This is similar to exploding the files, but we can use the subvolume
> >   to share data between different versions of an app. This will share
> >   disk space, but not page cache.
> > 
> >   Removal of apps is atomic, although you can't remove a btrfs volume
> >   until its not mounted anymore (i.e. the app is not in use anymore).
> > 
> >   Also, btrfs volume removal requires root rights, as do mounting a
> >   loopback btrfs image so some level of setuid helper is needed.
> > 
> >   btrfs also has an interesting feature where you can btrfs-send a
> >   subvolume, which creates a file describing the diff from the parent
> >   volume and the subvolume. This can then be applied with
> >   btrfs-recieve which is a userspace app that applies a set of file
> >   ops to convert the parent to the new child state. This is imho, not
> >   super interesting for our usecase. Btrfs-send is rarely what you
> >   want anyway as a newly built version of an app is built from scratch
> >   anyway and not based on the previous version. One can use rsync to
> >   create a new subvolume based on the old one, but then you're using
> >   rsync, not btrfs-send to generate the diffs.
> 
> I absolutely disagree. Kay and I have been discussing this stuff with
> the btrfs folks. The thing is that we want the signatures for the
> files be transferred in-line. While the signature stuff doesn't exist
> right now for btrfs they guys working on it are ensuring that the
> signatures can be serialized from btrfs as part of the btrfs send/recv
> image, and then deserialized again on the destination, while staying
> fully valid.

The signature thing is the one real advantage that the btrfs solution
has, and it is something nothing else gives us. 

> Harald has been playing around with some build logic that makes sure
> that rebuilt app updates are efficiently shipped as btrfs send/recv,
> with stable inode numbers and stuff.

How exactly do you envision this would work in practice for updates? Say
you have an application that receives regular updates (major and minor).
At any time the user comes in an does a fetch-from-scratch, or an update
between two essentially "random" versions.  What does the server store?
A copy of each full image? Only for major versions? Delta inbetween each
consecutive image? Delta between each possible image pair?

It seems to me like a git-like format like ostree would allow a much
easier more efficient distribution model for updates on highly mirrored
dumb servers than this.

> You know, this is explicitly something where we shouldn't reinvent the
> wheel. It's quite frankly crazy to come up with a new serialization
> format, that contains per-file verification data, that then somehow
> can be deserialized on some destination system again back into the fs
> layer...

The hard part obviously having the kernel verify the signatures, that
requires deep kernel FS works, which doesn't exist yet, and only the
btrfs people are working on. However, when they come up with something
it could very well be that it can be used for other things than
btrfs-recive (as btrfs-recive is just essentially a stream of syscalls).
Is the design discussions on this happening in the open somewhere?

> I know that the Red Hat fs crew hates btrfs like it was the devil, and
> loves LVM/DM like it was a healthy project. But yuck, just yuck!

I'm not particularly fond of a device-mapper approach either, but I was
listing all options, so it needed to be in there. That said, I'm also a
btrfs user on all my development machines, and I can't say my experience
with it has been exactly stellar...

_______________________________________________
gnome-os-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/gnome-os-list

Reply via email to