Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On Mon, 07.11.16 09:17, Daniel P. Berrange (berra...@redhat.com) wrote: > On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote: > > Hey udev developers, > > > > I'm a libvirt developer and I've been facing an interesting issue > > recently. Libvirt is a library for managing virtual machines and as such > > allows basically any device to be exposed to a virtual machine. For > > instance, a virtual machine can use /dev/sdX as its own disk. Because of > > security reasons we allow users to configure their VMs to run under > > different UID/GID and also SELinux context. That means that whenever a > > VM is being started up, libvirtd (our daemon we have) relabels all the > > necessary paths that QEMU process (representing VM) can touch. > > However, I'm facing an issue that I don't know how to fix. In some cases > > QEMU can close & reopen a block device. However, closing a block device > > triggers an event and hence if there is a rule that sets a security > > label on a device the QEMU process is unable to reopen the device again. > > > > My question is, whet we can do to prevent udev from mangling with our > > security labels that we've set on the devices? > > > > One of the ideas our lead developer had was for libvirt to set some kind > > of udev label on devices managed by libvirt (when setting up security > > labels) and then whenever udev sees such labelled device it won't touch > > it at all (this could be achieved by a rule perhaps?). Later, when > > domain is shutting down libvirt removes that label. But I don't think > > setting an arbitrary label on devices is supported, is it? > > Having thought about this over the weekend, I'm strongly inclined to > just take udev out of the equation by starting a new mount namespace > for each QEMU we launch and setting up a custom /dev containing just > the devices we need. This will be both a security improvement and > avoid the udev races, with no complex code required in libvirt and > will work for libvirt all the way back to RHEL6 I think this would be a pretty nice solution, indeed! Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On Fri, 11.11.16 14:15, Michal Sekletar (msekl...@redhat.com) wrote: > On Mon, Nov 7, 2016 at 1:20 PM, Daniel P. Berrange> wrote: > > > So if libvirt creates a private mount namespace for each QEMU and mounts > > a custom /dev there, this is invisible to udev, and thus udev won't/can't > > mess with permissions we set in our private /dev. > > > > For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver > > currently does. It would fork and setns() into the QEMU mount namespace > > and run mknod()+chmod() there, before doing the rest of its normal hotplug > > logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does. > > We try to migrate people away from using mknod and messing with /dev/ > from user-space. For example, we had to deal with non-trivial problems > wrt. mknod and Veritas storage stack in the past (most of these issues > remain unsolved to date). I don't like to hear that you plan to get > into /dev management business in libvirt too. I am judging based on > past experiences, nevertheless, I don't like this plan. Well, I'd say: if people create their own /dev, they are welcome to do in it whatever they want. They should just stay away from the host's /dev however, and not interfere with udev's own managing of that. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On Fri, Nov 11, 2016 at 05:01:40PM +0100, Michal Sekletar wrote: > On Fri, Nov 11, 2016 at 2:20 PM, Daniel P. Berrange> wrote: > > > What kind of issues ? > > General problem with manually created device nodes is that udev and > systemd do not know about them. Device units do not exist for these > device nodes. Hence these device units can not be a dependency of some > other unit. Typical example is manually created device node referenced > from /etc/fstab. Then corresponding mount unit is bound to a device > that never shows up and hence it always fails to mount even tough > device node is there. Ok, that sounds irrelevant to libvirt's usage wrt QEMU, so I don't see any problem for us here. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On Fri, Nov 11, 2016 at 2:20 PM, Daniel P. Berrangewrote: > What kind of issues ? General problem with manually created device nodes is that udev and systemd do not know about them. Device units do not exist for these device nodes. Hence these device units can not be a dependency of some other unit. Typical example is manually created device node referenced from /etc/fstab. Then corresponding mount unit is bound to a device that never shows up and hence it always fails to mount even tough device node is there. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On Fri, Nov 11, 2016 at 02:15:38PM +0100, Michal Sekletar wrote: > On Mon, Nov 7, 2016 at 1:20 PM, Daniel P. Berrange> wrote: > > > So if libvirt creates a private mount namespace for each QEMU and mounts > > a custom /dev there, this is invisible to udev, and thus udev won't/can't > > mess with permissions we set in our private /dev. > > > > For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver > > currently does. It would fork and setns() into the QEMU mount namespace > > and run mknod()+chmod() there, before doing the rest of its normal hotplug > > logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does. > > We try to migrate people away from using mknod and messing with /dev/ > from user-space. For example, we had to deal with non-trivial problems > wrt. mknod and Veritas storage stack in the past (most of these issues What kind of issues ? > remain unsolved to date). I don't like to hear that you plan to get > into /dev management business in libvirt too. I am judging based on > past experiences, nevertheless, I don't like this plan. Libvirt is already doing this for its LXC driver, populating a private /dev with only the devices permitted for the container in question. > Also, managing separate mount namespace for each qemu process and > forking helper that joins the namespace to do some work seems quite > complex too. Again, libvirt is already doing this for LXC so its not any great burden. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On Mon, Nov 7, 2016 at 1:20 PM, Daniel P. Berrangewrote: > So if libvirt creates a private mount namespace for each QEMU and mounts > a custom /dev there, this is invisible to udev, and thus udev won't/can't > mess with permissions we set in our private /dev. > > For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver > currently does. It would fork and setns() into the QEMU mount namespace > and run mknod()+chmod() there, before doing the rest of its normal hotplug > logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does. We try to migrate people away from using mknod and messing with /dev/ from user-space. For example, we had to deal with non-trivial problems wrt. mknod and Veritas storage stack in the past (most of these issues remain unsolved to date). I don't like to hear that you plan to get into /dev management business in libvirt too. I am judging based on past experiences, nevertheless, I don't like this plan. Also, managing separate mount namespace for each qemu process and forking helper that joins the namespace to do some work seems quite complex too. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On Mon, Nov 07, 2016 at 01:11:14PM +0100, Michal Privoznik wrote: > On 07.11.2016 10:17, Daniel P. Berrange wrote: > > On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote: > >> Hey udev developers, > >> > >> I'm a libvirt developer and I've been facing an interesting issue > >> recently. Libvirt is a library for managing virtual machines and as such > >> allows basically any device to be exposed to a virtual machine. For > >> instance, a virtual machine can use /dev/sdX as its own disk. Because of > >> security reasons we allow users to configure their VMs to run under > >> different UID/GID and also SELinux context. That means that whenever a > >> VM is being started up, libvirtd (our daemon we have) relabels all the > >> necessary paths that QEMU process (representing VM) can touch. > >> However, I'm facing an issue that I don't know how to fix. In some cases > >> QEMU can close & reopen a block device. However, closing a block device > >> triggers an event and hence if there is a rule that sets a security > >> label on a device the QEMU process is unable to reopen the device again. > >> > >> My question is, whet we can do to prevent udev from mangling with our > >> security labels that we've set on the devices? > >> > >> One of the ideas our lead developer had was for libvirt to set some kind > >> of udev label on devices managed by libvirt (when setting up security > >> labels) and then whenever udev sees such labelled device it won't touch > >> it at all (this could be achieved by a rule perhaps?). Later, when > >> domain is shutting down libvirt removes that label. But I don't think > >> setting an arbitrary label on devices is supported, is it? > > > > Having thought about this over the weekend, I'm strongly inclined to > > just take udev out of the equation by starting a new mount namespace > > for each QEMU we launch and setting up a custom /dev containing just > > the devices we need. This will be both a security improvement and > > avoid the udev races, with no complex code required in libvirt and > > will work for libvirt all the way back to RHEL6 > > How would this work with device hotplug, i.e. I start a domain with some > set of devices. Then I bring up an iSCSI target (which appears under > /dev) and how does one 'transfer' the device into the new namespace? > BTW: can you elaborate more one udev-namespace relations? Doesn't udev > run in the namespaces too? A single process can only ever be in a single namespace at any point in time and udev only ever runs in the initial namespaces. When running containers you never have udev inside them, and udev certainly doesn't interact with arbitrary namespaces created by other applications for their own purposes. So if libvirt creates a private mount namespace for each QEMU and mounts a custom /dev there, this is invisible to udev, and thus udev won't/can't mess with permissions we set in our private /dev. For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver currently does. It would fork and setns() into the QEMU mount namespace and run mknod()+chmod() there, before doing the rest of its normal hotplug logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On 07.11.2016 10:17, Daniel P. Berrange wrote: > On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote: >> Hey udev developers, >> >> I'm a libvirt developer and I've been facing an interesting issue >> recently. Libvirt is a library for managing virtual machines and as such >> allows basically any device to be exposed to a virtual machine. For >> instance, a virtual machine can use /dev/sdX as its own disk. Because of >> security reasons we allow users to configure their VMs to run under >> different UID/GID and also SELinux context. That means that whenever a >> VM is being started up, libvirtd (our daemon we have) relabels all the >> necessary paths that QEMU process (representing VM) can touch. >> However, I'm facing an issue that I don't know how to fix. In some cases >> QEMU can close & reopen a block device. However, closing a block device >> triggers an event and hence if there is a rule that sets a security >> label on a device the QEMU process is unable to reopen the device again. >> >> My question is, whet we can do to prevent udev from mangling with our >> security labels that we've set on the devices? >> >> One of the ideas our lead developer had was for libvirt to set some kind >> of udev label on devices managed by libvirt (when setting up security >> labels) and then whenever udev sees such labelled device it won't touch >> it at all (this could be achieved by a rule perhaps?). Later, when >> domain is shutting down libvirt removes that label. But I don't think >> setting an arbitrary label on devices is supported, is it? > > Having thought about this over the weekend, I'm strongly inclined to > just take udev out of the equation by starting a new mount namespace > for each QEMU we launch and setting up a custom /dev containing just > the devices we need. This will be both a security improvement and > avoid the udev races, with no complex code required in libvirt and > will work for libvirt all the way back to RHEL6 How would this work with device hotplug, i.e. I start a domain with some set of devices. Then I bring up an iSCSI target (which appears under /dev) and how does one 'transfer' the device into the new namespace? BTW: can you elaborate more one udev-namespace relations? Doesn't udev run in the namespaces too? Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [libvirt] How to make udev not touch my device?
On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote: > Hey udev developers, > > I'm a libvirt developer and I've been facing an interesting issue > recently. Libvirt is a library for managing virtual machines and as such > allows basically any device to be exposed to a virtual machine. For > instance, a virtual machine can use /dev/sdX as its own disk. Because of > security reasons we allow users to configure their VMs to run under > different UID/GID and also SELinux context. That means that whenever a > VM is being started up, libvirtd (our daemon we have) relabels all the > necessary paths that QEMU process (representing VM) can touch. > However, I'm facing an issue that I don't know how to fix. In some cases > QEMU can close & reopen a block device. However, closing a block device > triggers an event and hence if there is a rule that sets a security > label on a device the QEMU process is unable to reopen the device again. > > My question is, whet we can do to prevent udev from mangling with our > security labels that we've set on the devices? > > One of the ideas our lead developer had was for libvirt to set some kind > of udev label on devices managed by libvirt (when setting up security > labels) and then whenever udev sees such labelled device it won't touch > it at all (this could be achieved by a rule perhaps?). Later, when > domain is shutting down libvirt removes that label. But I don't think > setting an arbitrary label on devices is supported, is it? Having thought about this over the weekend, I'm strongly inclined to just take udev out of the equation by starting a new mount namespace for each QEMU we launch and setting up a custom /dev containing just the devices we need. This will be both a security improvement and avoid the udev races, with no complex code required in libvirt and will work for libvirt all the way back to RHEL6 Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel