Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-11 Thread Lennart Poettering
On Mon, 07.11.16 09:17, Daniel P. Berrange (berra...@redhat.com) wrote:

> On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote:
> > Hey udev developers,
> > 
> > I'm a libvirt developer and I've been facing an interesting issue
> > recently. Libvirt is a library for managing virtual machines and as such
> > allows basically any device to be exposed to a virtual machine. For
> > instance, a virtual machine can use /dev/sdX as its own disk. Because of
> > security reasons we allow users to configure their VMs to run under
> > different UID/GID and also SELinux context. That means that whenever a
> > VM is being started up, libvirtd (our daemon we have) relabels all the
> > necessary paths that QEMU process (representing VM) can touch.
> > However, I'm facing an issue that I don't know how to fix. In some cases
> > QEMU can close & reopen a block device. However, closing a block device
> > triggers an event and hence if there is a rule that sets a security
> > label on a device the QEMU process is unable to reopen the device again.
> > 
> > My question is, whet we can do to prevent udev from mangling with our
> > security labels that we've set on the devices?
> > 
> > One of the ideas our lead developer had was for libvirt to set some kind
> > of udev label on devices managed by libvirt (when setting up security
> > labels) and then whenever udev sees such labelled device it won't touch
> > it at all (this could be achieved by a rule perhaps?). Later, when
> > domain is shutting down libvirt removes that label. But I don't think
> > setting an arbitrary label on devices is supported, is it?
> 
> Having thought about this over the weekend, I'm strongly inclined to
> just take udev out of the equation by starting a new mount namespace
> for each QEMU we launch and setting up a custom /dev containing just
> the devices we need. This will be both a security improvement and
> avoid the udev races, with no complex code required in libvirt and
> will work for libvirt all the way back to RHEL6

I think this would be a pretty nice solution, indeed!

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-11 Thread Lennart Poettering
On Fri, 11.11.16 14:15, Michal Sekletar (msekl...@redhat.com) wrote:

> On Mon, Nov 7, 2016 at 1:20 PM, Daniel P. Berrange  
> wrote:
> 
> > So if libvirt creates a private mount namespace for each QEMU and mounts
> > a custom /dev there, this is invisible to udev, and thus udev won't/can't
> > mess with permissions we set in our private /dev.
> >
> > For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver
> > currently does. It would fork and setns() into the QEMU mount namespace
> > and run mknod()+chmod() there, before doing the rest of its normal hotplug
> > logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does.
> 
> We try to migrate people away from using mknod and messing with /dev/
> from user-space. For example, we had to deal with non-trivial problems
> wrt. mknod and Veritas storage stack in the past (most of these issues
> remain unsolved to date). I don't like to hear that you plan to get
> into /dev management business in libvirt too. I am judging based on
> past experiences, nevertheless, I don't like this plan.

Well, I'd say: if people create their own /dev, they are welcome to do
in it whatever they want. They should just stay away from the host's
/dev however, and not interfere with udev's own managing of that.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-11 Thread Daniel P. Berrange
On Fri, Nov 11, 2016 at 05:01:40PM +0100, Michal Sekletar wrote:
> On Fri, Nov 11, 2016 at 2:20 PM, Daniel P. Berrange  
> wrote:
> 
> > What kind of issues ?
> 
> General problem with manually created device nodes is that udev and
> systemd do not know about them. Device units do not exist for these
> device nodes. Hence these device units can not be a dependency of some
> other unit. Typical example is manually created device node referenced
> from /etc/fstab. Then corresponding mount unit is bound to a device
> that never shows up and hence it always fails to mount even tough
> device node is there.

Ok, that sounds irrelevant to libvirt's usage wrt QEMU, so I don't
see any problem for us here.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-11 Thread Michal Sekletar
On Fri, Nov 11, 2016 at 2:20 PM, Daniel P. Berrange  wrote:

> What kind of issues ?

General problem with manually created device nodes is that udev and
systemd do not know about them. Device units do not exist for these
device nodes. Hence these device units can not be a dependency of some
other unit. Typical example is manually created device node referenced
from /etc/fstab. Then corresponding mount unit is bound to a device
that never shows up and hence it always fails to mount even tough
device node is there.

Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-11 Thread Daniel P. Berrange
On Fri, Nov 11, 2016 at 02:15:38PM +0100, Michal Sekletar wrote:
> On Mon, Nov 7, 2016 at 1:20 PM, Daniel P. Berrange  
> wrote:
> 
> > So if libvirt creates a private mount namespace for each QEMU and mounts
> > a custom /dev there, this is invisible to udev, and thus udev won't/can't
> > mess with permissions we set in our private /dev.
> >
> > For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver
> > currently does. It would fork and setns() into the QEMU mount namespace
> > and run mknod()+chmod() there, before doing the rest of its normal hotplug
> > logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does.
> 
> We try to migrate people away from using mknod and messing with /dev/
> from user-space. For example, we had to deal with non-trivial problems
> wrt. mknod and Veritas storage stack in the past (most of these issues

What kind of issues ? 

> remain unsolved to date). I don't like to hear that you plan to get
> into /dev management business in libvirt too. I am judging based on
> past experiences, nevertheless, I don't like this plan.

Libvirt is already doing this for its LXC driver, populating a private
/dev with only the devices permitted for the container in question.

> Also, managing separate mount namespace for each qemu process and
> forking helper that joins the namespace to do some work seems quite
> complex too.

Again, libvirt is already doing this for LXC so its not any great
burden.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-11 Thread Michal Sekletar
On Mon, Nov 7, 2016 at 1:20 PM, Daniel P. Berrange  wrote:

> So if libvirt creates a private mount namespace for each QEMU and mounts
> a custom /dev there, this is invisible to udev, and thus udev won't/can't
> mess with permissions we set in our private /dev.
>
> For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver
> currently does. It would fork and setns() into the QEMU mount namespace
> and run mknod()+chmod() there, before doing the rest of its normal hotplug
> logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does.

We try to migrate people away from using mknod and messing with /dev/
from user-space. For example, we had to deal with non-trivial problems
wrt. mknod and Veritas storage stack in the past (most of these issues
remain unsolved to date). I don't like to hear that you plan to get
into /dev management business in libvirt too. I am judging based on
past experiences, nevertheless, I don't like this plan.

Also, managing separate mount namespace for each qemu process and
forking helper that joins the namespace to do some work seems quite
complex too.

Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-07 Thread Daniel P. Berrange
On Mon, Nov 07, 2016 at 01:11:14PM +0100, Michal Privoznik wrote:
> On 07.11.2016 10:17, Daniel P. Berrange wrote:
> > On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote:
> >> Hey udev developers,
> >>
> >> I'm a libvirt developer and I've been facing an interesting issue
> >> recently. Libvirt is a library for managing virtual machines and as such
> >> allows basically any device to be exposed to a virtual machine. For
> >> instance, a virtual machine can use /dev/sdX as its own disk. Because of
> >> security reasons we allow users to configure their VMs to run under
> >> different UID/GID and also SELinux context. That means that whenever a
> >> VM is being started up, libvirtd (our daemon we have) relabels all the
> >> necessary paths that QEMU process (representing VM) can touch.
> >> However, I'm facing an issue that I don't know how to fix. In some cases
> >> QEMU can close & reopen a block device. However, closing a block device
> >> triggers an event and hence if there is a rule that sets a security
> >> label on a device the QEMU process is unable to reopen the device again.
> >>
> >> My question is, whet we can do to prevent udev from mangling with our
> >> security labels that we've set on the devices?
> >>
> >> One of the ideas our lead developer had was for libvirt to set some kind
> >> of udev label on devices managed by libvirt (when setting up security
> >> labels) and then whenever udev sees such labelled device it won't touch
> >> it at all (this could be achieved by a rule perhaps?). Later, when
> >> domain is shutting down libvirt removes that label. But I don't think
> >> setting an arbitrary label on devices is supported, is it?
> > 
> > Having thought about this over the weekend, I'm strongly inclined to
> > just take udev out of the equation by starting a new mount namespace
> > for each QEMU we launch and setting up a custom /dev containing just
> > the devices we need. This will be both a security improvement and
> > avoid the udev races, with no complex code required in libvirt and
> > will work for libvirt all the way back to RHEL6
> 
> How would this work with device hotplug, i.e. I start a domain with some
> set of devices. Then I bring up an iSCSI target (which appears under
> /dev) and how does one 'transfer' the device into the new namespace?
> BTW: can you elaborate more one udev-namespace relations? Doesn't udev
> run in the namespaces too?

A single process can only ever be in a single namespace at any point in
time and udev only ever runs in the initial namespaces. When running
containers you never have udev inside them, and udev certainly doesn't
interact with arbitrary namespaces created by other applications for
their own purposes.

So if libvirt creates a private mount namespace for each QEMU and mounts
a custom /dev there, this is invisible to udev, and thus udev won't/can't
mess with permissions we set in our private /dev.

For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver
currently does. It would fork and setns() into the QEMU mount namespace
and run mknod()+chmod() there, before doing the rest of its normal hotplug
logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-07 Thread Michal Privoznik
On 07.11.2016 10:17, Daniel P. Berrange wrote:
> On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote:
>> Hey udev developers,
>>
>> I'm a libvirt developer and I've been facing an interesting issue
>> recently. Libvirt is a library for managing virtual machines and as such
>> allows basically any device to be exposed to a virtual machine. For
>> instance, a virtual machine can use /dev/sdX as its own disk. Because of
>> security reasons we allow users to configure their VMs to run under
>> different UID/GID and also SELinux context. That means that whenever a
>> VM is being started up, libvirtd (our daemon we have) relabels all the
>> necessary paths that QEMU process (representing VM) can touch.
>> However, I'm facing an issue that I don't know how to fix. In some cases
>> QEMU can close & reopen a block device. However, closing a block device
>> triggers an event and hence if there is a rule that sets a security
>> label on a device the QEMU process is unable to reopen the device again.
>>
>> My question is, whet we can do to prevent udev from mangling with our
>> security labels that we've set on the devices?
>>
>> One of the ideas our lead developer had was for libvirt to set some kind
>> of udev label on devices managed by libvirt (when setting up security
>> labels) and then whenever udev sees such labelled device it won't touch
>> it at all (this could be achieved by a rule perhaps?). Later, when
>> domain is shutting down libvirt removes that label. But I don't think
>> setting an arbitrary label on devices is supported, is it?
> 
> Having thought about this over the weekend, I'm strongly inclined to
> just take udev out of the equation by starting a new mount namespace
> for each QEMU we launch and setting up a custom /dev containing just
> the devices we need. This will be both a security improvement and
> avoid the udev races, with no complex code required in libvirt and
> will work for libvirt all the way back to RHEL6

How would this work with device hotplug, i.e. I start a domain with some
set of devices. Then I bring up an iSCSI target (which appears under
/dev) and how does one 'transfer' the device into the new namespace?
BTW: can you elaborate more one udev-namespace relations? Doesn't udev
run in the namespaces too?

Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [libvirt] How to make udev not touch my device?

2016-11-07 Thread Daniel P. Berrange
On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote:
> Hey udev developers,
> 
> I'm a libvirt developer and I've been facing an interesting issue
> recently. Libvirt is a library for managing virtual machines and as such
> allows basically any device to be exposed to a virtual machine. For
> instance, a virtual machine can use /dev/sdX as its own disk. Because of
> security reasons we allow users to configure their VMs to run under
> different UID/GID and also SELinux context. That means that whenever a
> VM is being started up, libvirtd (our daemon we have) relabels all the
> necessary paths that QEMU process (representing VM) can touch.
> However, I'm facing an issue that I don't know how to fix. In some cases
> QEMU can close & reopen a block device. However, closing a block device
> triggers an event and hence if there is a rule that sets a security
> label on a device the QEMU process is unable to reopen the device again.
> 
> My question is, whet we can do to prevent udev from mangling with our
> security labels that we've set on the devices?
> 
> One of the ideas our lead developer had was for libvirt to set some kind
> of udev label on devices managed by libvirt (when setting up security
> labels) and then whenever udev sees such labelled device it won't touch
> it at all (this could be achieved by a rule perhaps?). Later, when
> domain is shutting down libvirt removes that label. But I don't think
> setting an arbitrary label on devices is supported, is it?

Having thought about this over the weekend, I'm strongly inclined to
just take udev out of the equation by starting a new mount namespace
for each QEMU we launch and setting up a custom /dev containing just
the devices we need. This will be both a security improvement and
avoid the udev races, with no complex code required in libvirt and
will work for libvirt all the way back to RHEL6

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel