Re: [systemd-devel] Creating executable device nodes in /dev?
On Wed, Dec 9, 2020 at 11:22 AM Topi Miettinen wrote: > > On 9.12.2020 17.14, Andy Lutomirski wrote: > > > Maybe also malware which can escape all means of detection, enforced by > the CPU? Though I don't know if any malware scanners for Linux work can > check for fileless, memory only malware. I don't think this is really relevant to malware detection. You can't do syscalls from SGX code, for example, and, even if you could, malware behavior analysis would be unaffected. The concern seems to be more that, once someone has discovered some malware, if it's protected by SGX then it's plausible that you can't disassemble it. > > > > > In Intel’s original vision, only specially licensed vendors could create > > SGX software, but Linux pushed back against this quite hard, and new CPUs > > allow unlicensed enclaves. So your Skylake CPUs support SGX, but not on > > Linux. > > Kudos to Linux for the push. :) I don't know if Linux gets full credit for this, but I think we at least had some impact. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Creating executable device nodes in /dev?
> On Dec 9, 2020, at 12:58 AM, Topi Miettinen wrote: > > On 9.12.2020 2.42, Jarkko Sakkinen wrote: >>> On Wed, Dec 09, 2020 at 02:15:28AM +0200, Jarkko Sakkinen wrote: >>> On Wed, Dec 09, 2020 at 01:15:27AM +0200, Topi Miettinen wrote: >>> As a further argument, I just did this on a Fedora system: >>> $ find /dev -perm /ugo+x -a \! -type d -a \! -type l >>> No results. So making /dev noexec doesn't seem to have any benefit. >> >> It's no surprise that there aren't any executables in /dev since >> removing MAKEDEV ages ago. That's not the issue, which is that >> /dev is a writable directory (for UID=0 but no capabilities are >> needed) and thus a potential location for constructing unapproved >> executables if it is also mounted exec (W^X). > > UID 0 can just change mount options, though, unless SELinux or similar is > used. And SELinux can protect /dev just fine without noexec. Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also SELinux is not universal and the policies might not contain all users or services. -Topi >>> >>> What's the data that supports having noexec /dev anyway? With root >>> access I can then just use something else like /dev/shm mount. >>> >>> Has there been out in the wild real world cases that noexec mount >>> of would have prevented? >> Typo: "of" = "of /dev" >>> For me this sounds a lot just something that "feels more secure" >>> without any measurable benefit. Can you prove me wrong? >> The debate is circled around something not well defined. Of course you >> get theoretically more safe system when you decrease priviliges anywhere >> in the system. Like you could start do grazy things with stuff that >> unprivilged user has access, in order to prevent malware to elevate to >> UID 0 in the first place. >> I think where this go intellectually wrong is that we are talking about >> *default installation* of a distribution. That should have somewhat sane >> common sense access control settings. For like a normal desktop user >> noexec /dev will not do any possible favor. >> Then there is the case when you want to harden installation for an >> application, let's' say some server. In that case you will anyway >> fine-tune the security settings and go grazy enough with hardening. >> When you tailor a server, it's a standard practice to enumerate and >> adjust the mount points if needed. > > I think we agree that there's a need for either way to allow SGX (if default > is hardened) or a hardening option (in the opposite case). For systemd I see > two approaches: > > 1. Default noexec /dev, override with something like > - ExecPaths=/dev > - MountOptions=/dev:rw,exec,dev,nosuid > - or even MountOptions=/dev/sgx:rw,exec,dev,nosuid > - ProtectDev=no > - AllowSGX=yes > > 2. Default exec /dev, override with > - NoExecPaths=/dev > - MountOptions=/dev:rw,noexec,dev,nosuid > - ProtectDev=yes > - DenySGX=yes > > I'd prefer 1. but of course 2. would be reasonable. I would argue for 2, for the following reason. I absolutely agree that hardening a system by making it impossible to create executable code dynamically is valuable, but I don’t think it’s a good default. By default, programs like gcc and clang should work, but so should JITs, and JITs are getting more popular and powerful all the time. In a default setting that allows JITs, etc, I see no benefit at all to making /dev noexec. To the contrary, making /dev noexec seems like plugging a little restricted corner of code creation (because it requires UID=0) while allowing the easy ways (/tmp, /home, /dev/shm, unshare(2), mmap(), etc). By all means let admins harden this, but I see no reason to apply some of the hardening when the rest is disabled. > >> To summarize, I neither understand the intended target audience. > > We have something in common: me neither. What's the target audience for SGX? > What's the use case? What are the users: browsers, system services? How would > applications use SGX? Should udev rules for /dev/sgx make it available to any > logged in users with uaccess tags? > > I would certainly like it to be available to all software, with the possible exception of extra-hardened systems. Using SGX is not really an interesting attack surface. The main threat is that malware might use SGX to make itself hard to reverse engineer. In Intel’s original vision, only specially licensed vendors could create SGX software, but Linux pushed back against this quite hard, and new CPUs allow unlicensed enclaves. So your Skylake CPUs support SGX, but not on Linux. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Creating executable device nodes in /dev?
> On Dec 8, 2020, at 12:45 PM, Topi Miettinen wrote: > > On 8.12.2020 20.07, Andy Lutomirski wrote: >>> On Thu, Nov 19, 2020 at 10:05 AM Topi Miettinen wrote: >>> >>> On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote: >>>> On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote: >>>>> Hi udev people- >>>>> >>>>> The upcoming Linux SGX driver has a device node /dev/sgx. User code >>>>> opens it, does various setup things, mmaps it, and needs to be able to >>>>> create PROT_EXEC mappings. This gets quite awkward if /dev is mounted >>>>> noexec. >>>>> >>>>> Can udev arrange to make a device node executable on distros that make >>>>> /dev noexec? This could be done by bind-mounting from an exec tmpfs. >>>>> Alternatively, the kernel could probably learn to ignore noexec on >>>>> /dev/sgx, but that seems a little bit evil. >>>> >>>> I'd be inclined to simply drop noexec from /dev by default. >>>> We don't do noexec on either /tmp or /dev/shm (because that causes >>>> immediate >>>> problems with stuff like Java and cffi). And if you have those two at your >>>> disposal anyway, having noexec on /dev doesn't seem important. >>> >>> I'd propose to not enable exec globally, but if a service needs SGX, it >>> could use something like MountOptions=/dev:exec only in those cases >>> where it's needed. That way it's possible to disallow writable and >>> executable file systems for most services (which typically don't need >>> /tmp or /dev/shm either). Of course the opposite >>> (MountOptions=/dev:noexec) would be also possible, but I'd expect that >>> this would be needed to be used more often. >>> >> I imagine the opposite would be more sensible. It seems odd to me >> that we would want any SGX-using service to require both special mount >> options and regular ACL permissions. > > How common are thes SGX-using services? Will every service start using it > without any special measures taken on it's behalf, or perhaps only a special > SGX control tool needs access? What about unprivileged user applications, do > they ever want to access SGX? Could something like Widevine deep in a browser > need to talk to SGX in a DRM scheme? I honestly don’t know. Widevine is probably some unholy mess of SGX and ME crud. But regular user programs may well end up using SGX for little non-evil enclaves, e.g. storing their keys securely. It would be nice if unprivileged enclaves just work as long as the use has appropriate permissions on the device nodes. SGX adoption has been severely hampered by the massive series of recent vulnerabilities and by Intel’s silly licensing scheme. The latter won’t be supported upstream. > >> As a further argument, I just did this on a Fedora system: >> $ find /dev -perm /ugo+x -a \! -type d -a \! -type l >> No results. So making /dev noexec doesn't seem to have any benefit. > > It's no surprise that there aren't any executables in /dev since removing > MAKEDEV ages ago. That's not the issue, which is that /dev is a writable > directory (for UID=0 but no capabilities are needed) and thus a potential > location for constructing unapproved executables if it is also mounted exec > (W^X). UID 0 can just change mount options, though, unless SELinux or similar is used. And SELinux can protect /dev just fine without noexec. > > -Topi ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Creating executable device nodes in /dev?
On Thu, Nov 19, 2020 at 10:05 AM Topi Miettinen wrote: > > On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote: > > On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote: > >> Hi udev people- > >> > >> The upcoming Linux SGX driver has a device node /dev/sgx. User code > >> opens it, does various setup things, mmaps it, and needs to be able to > >> create PROT_EXEC mappings. This gets quite awkward if /dev is mounted > >> noexec. > >> > >> Can udev arrange to make a device node executable on distros that make > >> /dev noexec? This could be done by bind-mounting from an exec tmpfs. > >> Alternatively, the kernel could probably learn to ignore noexec on > >> /dev/sgx, but that seems a little bit evil. > > > > I'd be inclined to simply drop noexec from /dev by default. > > We don't do noexec on either /tmp or /dev/shm (because that causes immediate > > problems with stuff like Java and cffi). And if you have those two at your > > disposal anyway, having noexec on /dev doesn't seem important. > > I'd propose to not enable exec globally, but if a service needs SGX, it > could use something like MountOptions=/dev:exec only in those cases > where it's needed. That way it's possible to disallow writable and > executable file systems for most services (which typically don't need > /tmp or /dev/shm either). Of course the opposite > (MountOptions=/dev:noexec) would be also possible, but I'd expect that > this would be needed to be used more often. > I imagine the opposite would be more sensible. It seems odd to me that we would want any SGX-using service to require both special mount options and regular ACL permissions. As a further argument, I just did this on a Fedora system: $ find /dev -perm /ugo+x -a \! -type d -a \! -type l No results. So making /dev noexec doesn't seem to have any benefit. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Creating executable device nodes in /dev?
Hi udev people- The upcoming Linux SGX driver has a device node /dev/sgx. User code opens it, does various setup things, mmaps it, and needs to be able to create PROT_EXEC mappings. This gets quite awkward if /dev is mounted noexec. Can udev arrange to make a device node executable on distros that make /dev noexec? This could be done by bind-mounting from an exec tmpfs. Alternatively, the kernel could probably learn to ignore noexec on /dev/sgx, but that seems a little bit evil. Thanks, Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] kdbus refactoring?
On Mon, Nov 9, 2015 at 9:07 AM, Greg KH <gre...@linuxfoundation.org> wrote: > On Mon, Nov 09, 2015 at 05:02:45PM +, Måns Rullgård wrote: >> Andy Lutomirski <l...@amacapital.net> writes: >> >> > On Sun, Nov 8, 2015 at 3:30 PM, Greg KH <gre...@linuxfoundation.org> wrote: >> >> On Sun, Nov 08, 2015 at 10:39:43PM +0100, Richard Weinberger wrote: >> >>> On Sun, Nov 8, 2015 at 10:35 PM, Greg KH <gre...@linuxfoundation.org> >> >>> wrote: >> >>> > On Sun, Nov 08, 2015 at 10:06:31PM +0100, Richard Weinberger wrote: >> >>> >> Hi all, >> >>> >> >> >>> >> after reading on the removal of kdbus from Rawhide[1] I've searched >> >>> >> the mailinglist archives for more details but didn't find anything. >> >>> >> So, what are your plans? >> >>> >> >> >>> >> [1] >> >>> >> https://lists.fedoraproject.org/pipermail/kernel/2015-October/006011.html >> >>> > >> >>> > As that link said, based on the result of the code being in Rawhide, it >> >>> > is now being reworked / redesigned. The result will be posted for >> >>> > review "when it's ready". >> >>> >> >>> If you rework/redesign something you have to know what you want to >> >>> change. >> >>> That's why I was asking for the plan... >> >> >> >> Since when do people post "plans" or "design documents" on lkml without >> >> real code? Again, code will be posted when it's ready, like any other >> >> kernel submission. >> > >> > I ask for feedback on ideas and designs on a fairly regular basis. I >> > even frequently get valuable feedback :) >> > >> > I would like to think that the kernel community would have something >> > of value to add to the process of designing and implementing a major >> > new IPC mechanism. >> >> The "trust us, we'll show it when it's ready" attitude reminds me of the >> controversial TPP and TTIP negotiations. > > Ok, that's just trolling, cut it out. > > When something is even in the "hey look, it works, here's the big > changes from last time", we will of course post it, but right now, > things are being totally revisited based on the feedback we have > received so far. Give people a chance to recover from conferences and > then get back to work... > I hate to say this, but this approach to receiving feedback makes me really dislike the process. I read a fairly large fraction of the kdbus code. I found what I perceived to be issues, and I spoke up. I was told for quite a while that the authors disagreed that the issues I found were issues and that my assessment of the security aspects of the code was correct. Now the submission has been withdrawn (because of feedback received so far? from me?) and there will apparently be a new submission out of the blue, allegedly based on feedback. As a developer, I'm willing to ask for feedback on ideas and to ask for feedback on code. In many cases, I've gotten (correct!) feedback telling me that my design is wrong or needs major changes. I *always* try to answer such feedback respectfully and, if the reviewers are right (which they usually are), I won't keep shoving code they don't like in their face. In fact, IIRC I got my start as a serious x86 developer when I wrote some code and tglx told me that the way I designed it was unacceptable for upstream. In response, I thought about what the issues were, asked some questions, and rewrite the majority of the code. I think I learned a lot from the process, and the code was vastly improved as a result. If I'd sent substantially the same patch series three or four more times and then declared that I was withdrawing it without commenting directly on what I'd changed, I really doubt that anyone would have taken my next submission seriously. Please understand that kdbus' approach to receiving feedback is very off-putting. Fortunately the vast majority of kernel developers receive feedback for graciously and transparently, because otherwise I'd probably just never review anything. Frankly, if I were in the chain of people through whom the kdbus code would flow to an eventual home in Linus' tree, I would just say that the developers have used up my patience as a reviewer and the onus would be on the developers to demonstrate that it's worth my time to continue thinking about the code. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] kdbus refactoring?
On Sun, Nov 8, 2015 at 3:30 PM, Greg KHwrote: > On Sun, Nov 08, 2015 at 10:39:43PM +0100, Richard Weinberger wrote: >> On Sun, Nov 8, 2015 at 10:35 PM, Greg KH wrote: >> > On Sun, Nov 08, 2015 at 10:06:31PM +0100, Richard Weinberger wrote: >> >> Hi all, >> >> >> >> after reading on the removal of kdbus from Rawhide[1] I've searched >> >> the mailinglist archives for more details but didn't find anything. >> >> So, what are your plans? >> >> >> >> [1] >> >> https://lists.fedoraproject.org/pipermail/kernel/2015-October/006011.html >> > >> > As that link said, based on the result of the code being in Rawhide, it >> > is now being reworked / redesigned. The result will be posted for >> > review "when it's ready". >> >> If you rework/redesign something you have to know what you want to change. >> That's why I was asking for the plan... > > Since when do people post "plans" or "design documents" on lkml without > real code? Again, code will be posted when it's ready, like any other > kernel submission. I ask for feedback on ideas and designs on a fairly regular basis. I even frequently get valuable feedback :) I would like to think that the kernel community would have something of value to add to the process of designing and implementing a major new IPC mechanism. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Add ambient capability support to execution environment config?
For non-root services, getting Capabilities= and CapabilityBoundingSet= to do anything useful is rather tricky. Would it make sense to add AmbientCapabilities= to set ambient (and, implicitly, inheritable) capabilities, which will be available in Linux 4.3? Alternatively, there could be a boolean option to change the meaning of Capabilities so that it uses ambient capabilities instead of whatever it currently does. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Apr 20, 2015 7:57 AM, Lennart Poettering lenn...@poettering.net wrote: On Fri, 17.04.15 09:14, Andy Lutomirski (l...@amacapital.net) wrote: My point here is that there's no real shortage of downsides to this scheme, and there still appears to be little to no benefit. Well, let's turn this around. You seem to really dislike caps. And you vaguely claim security holes, which we have shown know don't exist. So, now, can you clearly explain why precisely you dislike them so much still? And something more technical then systemd shouldn't use them or i don't like them, or they should only be used in the kernel, because these are not technical reasons, they are just claims of personal taste. I will grant you that they aren't particularly expressive, and I will grant you that one day there might be better concepts. But that's not a strong reason not to support them really, that's just a reason to later add support for something better. Technical reasons: 1. They can't be usefully delegated to a namespace. 2. The set of caps that exist is controlled by the kernel, whereas the set of dbus methods is large and controlled by userspace. Caps can't scale to accommodate flexble userspace policies. 3. They don't appear to add value, and avoiding unnecessary complexity is good. 4. They suck. This is a technical issue -- the kernel doesn't allow flexible assignments of caps to processes. This is a problem for kernel API users and it will be a problem for you. --Andy Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Apr 20, 2015 9:07 AM, Lennart Poettering lenn...@poettering.net wrote: On Mon, 20.04.15 08:51, Andy Lutomirski (l...@amacapital.net) wrote: I will grant you that they aren't particularly expressive, and I will grant you that one day there might be better concepts. But that's not a strong reason not to support them really, that's just a reason to later add support for something better. Technical reasons: 1. They can't be usefully delegated to a namespace. Not sure I can parse that. If you use the bus to communicate across namespace boundaries then each side lacks caps for the other. Great, that's how it should be. And same as for uid checks btw... if a uid cannot be translated, then it will not be passed! No. You're right for nspawn-style namespaces, but not for namespaces more broadly. Namespaces are a great way to drop various privileges, but your use of caps prevents certain uses (e.g. confining your hypothetical time-setting daemon in a namespace). I have seen no use of userns for sandboxing normal daemons so far. I have seen tons of daemons using caps for such sandboxing. Sandstorm.io I bet Chromium will follow suit, and don't forget Docker and similar tools. maybe if userns in its current iteration doesn't mix as well as you'd like it with caps this is indication that userns might need some more polish, and not that caps are necessarily the bad guy here? Nope, userns works just fine. I mean, as the one who added most of the sandboxing features we expose in systemd .service files currently (including things like ReadOnlyDirectories=, PrviateTmp=, PrivateNetwork= which are all based on kernel namespacing), I completely fail to see how we could even expose user namespace like this in a good way that would hold water? Capability-based sandboxing on the other hand is much easier to handle, and in many cases a highly efficient way to sandbox stuff. Two examples: There's a world outside systemd and, in that world, programs would like to be able to sandbox themselves. Userns is wonderful for that purpose. systemd-networkd drops privileges, becomes the systemd-network user, only retains CAP_NET_ADMIN. That's actually already a really good sandbox! systemd-timesyncd drops privileges, becomes the systemd-timesync user, only retains CAP_SYS_TIME. And that's actually a really good sandbox too! Both daemons are network facing, hence sandboxing things like this is actually of quite some importance, and it *works*! Today! And it is easy! easy enough for most administrators to grok it easily! And because of that it is actually *good* security! Except that if it's too coarse-gained, it fails to actually separate privileges. And please don't discount these two daemons as irrelevant examples. They are highly relevant, since they run on a good chunk of Linux systems, as one of the few daemons that run really everywhere. Caps *do* have good uses, please don't claim otherwise. They are a *lot* more relevant on today's system than userns at least! (Note that I am not saying that userns are really a bad idea or so, I just would like to ask you to keep things in perspective: caps are *good* -- even though they could be much better. And they are a ton more useful and used than userns right now) OK, they are not very expressive, I granted you that already. But they are still more expressive than uid == 0. That they are not expressive is something I can agree with, as mentioned above, but I don't consider this a real issue not to using them. I mean, it would be great if we had something better in the kernel, like capsicum or whatever, but we don't. And since caps are pretty well supported otherwise on Linux, and they are better then simple uid == 0 checks, I think they should be supported by kdbus too. This is a bad excuse. Given that you're designing something new, you have plenty of room to do better. We are not really in the business in designing comprehensive new access control systems that can be used for in-kernel and in-userspace subsystems. Sure, we also have bus policy, but that given it's non-interactive logic it's not really suitable for the uses where we need uid or caps checks, i.e. dynamic access control within called methods that need to check different privileges dynamically. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Apr 20, 2015 8:22 AM, Lennart Poettering lenn...@poettering.net wrote: On Mon, 20.04.15 08:08, Andy Lutomirski (l...@amacapital.net) wrote: On Apr 20, 2015 7:57 AM, Lennart Poettering lenn...@poettering.net wrote: On Fri, 17.04.15 09:14, Andy Lutomirski (l...@amacapital.net) wrote: My point here is that there's no real shortage of downsides to this scheme, and there still appears to be little to no benefit. Well, let's turn this around. You seem to really dislike caps. And you vaguely claim security holes, which we have shown know don't exist. So, now, can you clearly explain why precisely you dislike them so much still? And something more technical then systemd shouldn't use them or i don't like them, or they should only be used in the kernel, because these are not technical reasons, they are just claims of personal taste. I will grant you that they aren't particularly expressive, and I will grant you that one day there might be better concepts. But that's not a strong reason not to support them really, that's just a reason to later add support for something better. Technical reasons: 1. They can't be usefully delegated to a namespace. Not sure I can parse that. If you use the bus to communicate across namespace boundaries then each side lacks caps for the other. Great, that's how it should be. And same as for uid checks btw... if a uid cannot be translated, then it will not be passed! No. You're right for nspawn-style namespaces, but not for namespaces more broadly. Namespaces are a great way to drop various privileges, but your use of caps prevents certain uses (e.g. confining your hypothetical time-setting daemon in a namespace). 2. The set of caps that exist is controlled by the kernel, whereas the set of dbus methods is large and controlled by userspace. Caps can't scale to accommodate flexble userspace policies. OK, they are not very expressive, I granted you that already. But they are still more expressive than uid == 0. That they are not expressive is something I can agree with, as mentioned above, but I don't consider this a real issue not to using them. I mean, it would be great if we had something better in the kernel, like capsicum or whatever, but we don't. And since caps are pretty well supported otherwise on Linux, and they are better then simple uid == 0 checks, I think they should be supported by kdbus too. This is a bad excuse. Given that you're designing something new, you have plenty of room to do better. 3. They don't appear to add value, and avoiding unnecessary complexity is good. Well, I disagree on this. I think they are better because more fine-grained than uid == 0 checks. 4. They suck. This is a technical issue -- the kernel doesn't allow flexible assignments of caps to processes. This is a problem for kernel API users and it will be a problem for you. Not a technical reason, unlike you claim. Just a personal taste issue. Sure it is. Caps are defined in the kernel sources and, as seems to have been covered pretty well in this thread, the list of caps is very far from what a userspace dbus service would want to check. --Andy Honestly, I don't think the issues you raise are very convincing Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Apr 17, 2015 4:53 AM, Djalal Harouni tix...@opendz.org wrote: Hi Andy, On Thu, Apr 16, 2015 at 12:30:28PM -0700, Andy Lutomirski wrote: On Thu, Apr 16, 2015 at 11:23 AM, Lennart Poettering lenn...@poettering.net wrote: [...] AFAICT this piece of kdbus code serves to enable a rather odd way to write privilege-separated services to change the time and kill processes. The cost is complex security code that, at best, fails secure in the presence of different user namespaces, and the cost also involves touching a global refcount for each message sent (this might be the *only* thing that would reference init_user_ns's refcount when sending). Oh yeah, the cost is also ABI crap -- if, say, my The global ref-counts on metadata is just a workaround due to userns and caps. I actually thought we already sorted that out? https://lkml.org/lkml/2015/3/25/702 Hmm there are other paths that refs user_ns, the mqueue notification perhaps ? Please note that we also have _per_ user quota accounting, the trade off of accouting prevents further performance penalties on other bus operations. Referring to performance critical code, this code path can just be ignored by to not opt-in for KDBUS_ATTACH_CAPS which is the default behaviour. Quoting that link: It's conditional on KDBUS_ATTACH_CAPS, anyway. Fair enough. [end quote] I don't believe it'll be usefully conditional. Systemd is pretty clearly planning on using it, so you get a silly, if small, performance hit. My point here is that there's no real shortage of downsides to this scheme, and there still appears to be little to no benefit. --Andy Thanks! -- Djalal Harouni http://opendz.org ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Apr 17, 2015 5:42 AM, Simon McVittie simon.mcvit...@collabora.co.uk wrote: On 16/04/15 15:52, Andy Lutomirski wrote: (I really think this dichotomy needs to be removed, *especially* since it looks like code already exists to try to use both metadata sources. This seems like it's just asking for security screw-ups.) Would it address this concern if there was an explicit API separation into things that can't be faked, suitable for authorization and things that could have been faked, only suitable for debugging - such that the uid would be in the first set only, capabilities would be in the first set on kdbus but absent or in the second set on traditional D-Bus, and /proc/*/cmdline would always be in the second set? It would certainly improve the sd-bus code, I think. I'm not a systemd developer, though. From the kernel side, I don't even see the point of reporting caps for debugging IPC things. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Apr 17, 2015 6:05 AM, Cristian Rodríguez crrodrig...@opensuse.org wrote: On Fri, Apr 17, 2015 at 7:51 AM, Lennart Poettering lenn...@poettering.net wrote: Groups *suck* as authentication scheme. If you add one group for each privilege you want, then you'll have a huge number of groups, and that's hardly desirable. It's pretty close to being unmanagable with user/group editors. Also, you can never take group membership away, since users who once where members of group can create sgid binaries which allows them to always return into that group forever. Not to mention, we are running out of system users and groups in distributions (if we didn't already) and some people want us to provide fixed UID/GID system users across distributions for clustering applications...this is a totally unworkable way forward. I believe you're arguing that you think gids are a scarse resource, so you want to save ~2 gids (certainly fewer than 64) by inventing a whole new userspace authorization scheme using *caps* that doesn't even solve the problem that you want to solve. I'm not sure how this is supposed to justify anything. Caps are probably the single least scalable authorization mechanism you could come up with. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Thu, Apr 16, 2015 at 3:23 AM, Tom Gundersen t...@jklm.no wrote: Hi Andy, On Thu, Apr 16, 2015 at 2:55 AM, Andy Lutomirski l...@amacapital.net wrote: Yesterday, I discovered SD_BUS_VTABLE_CAPABILITY. Are there any examples in which it does anything? Please note that you need to be using kdbus to get any capabilities transported, so in dbus1 this does nothing (as on dbus1 using capabilities would be racy). Are you testing this on kdbus? No. I'm looking at sd_bus_query_sender_privilege, which does: r = sd_bus_query_sender_creds(call, SD_BUS_CREDS_UID|SD_BUS_CREDS_EUID|SD_BUS_CREDS_EFFECTIVE_CAPS, creds); That, in turn, does: if (!c || !(c-mask SD_BUS_CREDS_PID)) { /* We couldn't read anything from the call, let's try * to get it from the sender or peer */ if (call-sender) return sd_bus_get_name_creds(call-bus, call-sender, mask, creds); else return sd_bus_get_owner_creds(call-bus, mask, creds); } return bus_creds_extend_by_pid(c, mask, creds); The sd_bus_get_name_creds call seems questionable to me -- isn't that trying to augment missing information from the time of call with metadata from the sender's connection? (I really think this dichotomy needs to be removed, *especially* since it looks like code already exists to try to use both metadata sources. This seems like it's just asking for security screw-ups.) The sd_bus_get_owner_creds call looks totally bogus. I don't know when it would be called, but I hope the answer is provably never in response to a message from an unprivileged user, in which case I'd wonder why it's there. The bus_creds_extend_by_pid is what I'm asking about. When is it used? In any event, under certain (strewn-across-multiple-files with duplicated code in various places) conditions, all three paths above seem to land in bus_creds_add_more. It looks like some of this is gated on whether the sending PID is known (IMO this is odd -- it should be gated by whether you think it's a good idea, not whether a piece of information is known), but maybe mac_selinux_generic_access_check can circumvent that. So why exactly are capabilities only used with kdbus? I don't see it, and I do see code that tries to extract them from /proc. If so, I don't suppose any of you could give me an example of: $ cp `which dbus-send` . $ sudo setcap all=eip dbus-send $ dbus-send [not sure what goes here] that passes an authentication test that would have failed without the setcap? Note that dbus-send is the old dbus1 command, and will always go via the bus proxy that connects dbus1 clients to kdbus. As such, it will only carry the credentials that are available safely on dbus1, and hence not capabilities. So even if you do use kdbus on your machine, this example would not work. To be clear, the reason we do not use the capability logic on dbus1, but only on kdbus is as follows: On kdbus, the capabilities are attached to the method call itself, hence we safely know that at the time the method call was issued by the client side, that client actually really had those capabilities. On dbus1 however, which uses AF_UNIX/SOCK_STREAM, this is not possible: we only get the UID/GID/PID, hence we could only derive the caps from /proc/$PID/status. But this opens this up for a vulnerability: if a client quickly issues a method call, and then exec()s a suid binary, it could happen that the service side would read the capabilities from that SUID process, and not the original process, and the exploit was successful. We cannot allow that, hence no capabilities on dbus1, instead we open up things to a much, much broader uid == 0. kdbus allows us to be much stricter there, by allowing us to check only one specific capability. Agreed, and that's not the only vulnerability. But code to do it certainly exists in systemd. An equivalent example to the one you gave using native kdbus would be: $ cp `which busctl` . $ sudo setcap all=eip busctl $ ./busctl call org.freedesktop.timedate1 /org/freedesktop/timedate1 org.freedesktop.timedate1 SetTimezone sb Europe/London 0 I still don't see why, even if this were bug-free, it's even remotely useful. Keep in mind that basically no one has a capability if they're not root, and dbus users should really use *dbus* mechanisms to retain selected privileges when they drop other privileges. --Andy In the interest of full disclosure, I'm asking because I think that one of two things is true: 1. The SD_BUS_VTABLE_CAPABILITY code is useless and should therefore be deleted. This is not the case. 2. The SD_BUS_VTABLE_CAPABILITY code is exploitably buggy and should therefore be deleted. I have heard this claim from you many times, but so far I haven't been able to figure out what you have in mind. Could you either give an example of an exploit or at least the general scheme you
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Thu, Apr 16, 2015 at 8:59 AM, Lennart Poettering lenn...@poettering.net wrote: On Thu, 16.04.15 07:52, Andy Lutomirski (l...@amacapital.net) wrote: I'm looking at sd_bus_query_sender_privilege, which does: r = sd_bus_query_sender_creds(call, SD_BUS_CREDS_UID|SD_BUS_CREDS_EUID|SD_BUS_CREDS_EFFECTIVE_CAPS, creds); That, in turn, does: if (!c || !(c-mask SD_BUS_CREDS_PID)) { /* We couldn't read anything from the call, let's try * to get it from the sender or peer */ if (call-sender) return sd_bus_get_name_creds(call-bus, call-sender, mask, creds); else return sd_bus_get_owner_creds(call-bus, mask, creds); } return bus_creds_extend_by_pid(c, mask, creds); The sd_bus_get_name_creds call seems questionable to me -- isn't that trying to augment missing information from the time of call with metadata from the sender's connection? (I really think this dichotomy needs to be removed, *especially* since it looks like code already exists to try to use both metadata sources. This seems like it's just asking for security screw-ups.) This information is missing only for dbus1 clients connecting via the dbus1 compat proxy. In that case we need to fall back to connection credentials, because that's how authentication works on dbus1. The sd_bus_get_owner_creds call looks totally bogus. I don't know when it would be called, but I hope the answer is provably never in response to a message from an unprivileged user, in which case I'd wonder why it's there. Messages with empty senders are only used for dbus1 *direct* connections. This is a special setup dbus1 supports where communication is not done via the bus daemon, but via a direct AF_UNIX connection between two peers. It's very seldom used, and mostly a hack to avoid the performance penalty that the dbus-daemon means. These direct connections are entirely redundant in a kdbus world, and you better shouldn't think about them. Ultimately, this code exists in sd-bus for one reason only: in a non-kdbus world systemd must be accessible even if dbus-daemon is not up. Since dbus-daemon is only started relatively late during boot there's a window in early boot where we need to be accessible by some other means: for that we allow privileged clients to connect to us directly. We then still speak the dbus protocol, and offer similar APIs, but we do not use the bus topology. In kdbus all of that goes away, since kdbus is available all the time, since earliest time of boot. For such direct dbus1 connections it's the creds of the peer of the dbus connection that we need to authenticate, that is exposed as owner creds. Or in other words, the if check above you can read like this: if (we_get_a_full_kdbus_message) use_the_attached_creds(); else { if (we_get_a_compat_dbus1_message_via_kdbus_or_native_dbus1) use_the_senders_creds() else if (we_get_a_message_via_dbus_direct_connection()) use_the_peer_creds(); } To make this easier to understand I'll add some comments to this code: http://cgit.freedesktop.org/systemd/systemd/commit/?id=2d0c1561340efff3265fe89b05eae4ee8f4037a7 The bus_creds_extend_by_pid is what I'm asking about. When is it used? It's a noop, unless people OR in SD_BUS_CREDS_AUGMENT into the flags of creds they want. Doing this basically voids your warranty: it means that the creds data shall be augmented with data from /proc, which are good enough for logging, debugging things, but not for security relevant things, such as authorizing message calls. OK, I finally found the relevant AUGMENT check. Our own message authorization call does not make use of this flag. mac_selinux_generic_access_check does. It also uses sender metadata, which I find interesting given that you seem to strongly prefer using message metadata. Also see the comment about this in sd-bus.h: http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h?id=2d0c1561340efff3265fe89b05eae4ee8f4037a7#n89 Or check the man page for busctl that explains this too: http://www.freedesktop.org/software/systemd/man/busctl.html#--augment-creds=BOOL In any event, under certain (strewn-across-multiple-files with duplicated code in various places) conditions, all three paths above seem to land in bus_creds_add_more. It looks like some of this is gated on whether the sending PID is known (IMO this is odd -- it should be gated by whether you think it's a good idea, not whether a piece of information is known), but maybe mac_selinux_generic_access_check can circumvent that. Well, the augmentation code needs the PID to read the data from /proc, that's what it does after all... So why exactly are capabilities only used with kdbus? I don't see it, and I do see code that tries to extract them from /proc
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Thu, Apr 16, 2015 at 9:43 AM, Tom Gundersen t...@jklm.no wrote: On Thu, Apr 16, 2015 at 4:52 PM, Andy Lutomirski l...@amacapital.net wrote: Unshare your user namespace, set things up right, and systemd or any other server will see you as having all capabilities. You've fixed that in kdbus, but you haven't (and probably can't!) fix it in the legacy code, and that legacy code is still there (!). The dbus1 code (which I assume you mean when you say legacy code) does not make use of capabilities, and it should not (see Lennart's answer for all the details). If anything, this should be an argument to move to kdbus with native, race-free capability-passing support. Do I understand correctly, that any concerns you had are about systemd and its dbus1 compat code (which of course we should take seriously too), and that you no longer see any security vulnerabilities in the capability related code of kdbus? The ratio of complexity of capability code the kdbus folks have already written (hundreds of lines across multiple files) to its utility (very near zero AFAICT) is, in my book, not a good sign at all. We have several uses of this, see my mail to Jiri regarding CAP_SYS_BOOT for instance: https://lkml.org/lkml/2015/4/16/219 I read that, but I disagree with you. CAP_SYS_BOOT is the privilege to directly hard-reboot the system, not the privilege to initiate a clean reboot. Keep in mind that, on some recent Windows versions, for the most part you *can't* directly hard-reboot the system; instead you have to give the OS a reason so the OS can log it. IOW, the high-level Windows reboot permission doesn't confer the privilege of directly hard-rebooting. Maybe systemd or GNOME will want to do that some day. However, what we are trying to get to the bottom of is if you see any technical problems with the current kdbus capability handling code. My understanding is that you don't. I have a technical problem with it: it's a design that has insufficient justification. It also seems like it will be quite limiting in the future, *especially* wrt user namespaces. I agree that it's probably not exploitable *if used carefully* in the latest kdbus code. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Thu, Apr 16, 2015 at 10:43 AM, Tom Gundersen t...@jklm.no wrote: On Thu, Apr 16, 2015 at 5:57 PM, Andy Lutomirski l...@amacapital.net wrote: We have several uses of this, see my mail to Jiri regarding CAP_SYS_BOOT for instance: https://lkml.org/lkml/2015/4/16/219 I read that, but I disagree with you. CAP_SYS_BOOT is the privilege to directly hard-reboot the system, not the privilege to initiate a clean reboot. My main point that being allowed to do a hard-reboot, but not to do a clean reboot, makes no sense. However, what we are trying to get to the bottom of is if you see any technical problems with the current kdbus capability handling code. My understanding is that you don't. I have a technical problem with it: it's a design that has insufficient justification. It also seems like it will be quite limiting in the future, *especially* wrt user namespaces. What I was really asking was if you see any actual vulnerabilities. That we have a different technical opinion on the design of this is a different matter. I agree that it's probably not exploitable *if used carefully* in the latest kdbus code. It would be very helpful if you could go into details on why you think more care is needed here than for other things. Is there anything non-trivial going on here that I'm missing? The way capabilites are exposed through kdbus seems perfectly straight-forward to me, so I'm really trying to get my head around your concerns here. So, let me ask explicitly again: * Are there any actual exploitable vulnerabilities you are aware of in the kdbus kernel code? * Are there any actual exploitable vulnerabilities on the sd-bus userspace code you are aware of, regardless if in the kdbus or dbus1 support in it? If you are, please provide us with good explanations, so that we can do something about them. We'd love to fix this! I don't see a vulnerability in the code as such. But you still haven't given a real use case, and vulnerabilities in security mechanisms are often found in the interactions between components. There doesn't seem to be a big picture here. Can you give a plausible real world example along the lines of process A, which has capability B for some decent reason, will send message C to daemon D, and D will permit the request because of capability B, but the request would not otherwise have been permitted. So far it seems like there's a bunch of code that may not be exploitable but doesn't really do anything either. And, honestly, if this is only about rebooting and setting the time, then, even if correct, this is seriously not worth the complexity even if it's completely sensible and correct. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Thu, Apr 16, 2015 at 10:30 AM, Lennart Poettering lenn...@poettering.net wrote: On Thu, 16.04.15 09:53, Andy Lutomirski (l...@amacapital.net) wrote: It's a noop, unless people OR in SD_BUS_CREDS_AUGMENT into the flags of creds they want. Doing this basically voids your warranty: it means that the creds data shall be augmented with data from /proc, which are good enough for logging, debugging things, but not for security relevant things, such as authorizing message calls. OK, I finally found the relevant AUGMENT check. Our own message authorization call does not make use of this flag. mac_selinux_generic_access_check does. It also uses sender metadata, which I find interesting given that you seem to strongly prefer using message metadata. It certainly does. But this is not actually used for authentication either, ly, because the auth code in there only cares for the selinux label, which is available as native connection metadata on dbus1 already. The augmentation is used because the selinux guys want to generate a nice log message when the permission is not granted. The selinux code in systemd was actually contributed by the selinux people a long time ago. It contained code to read data from /proc manually for showing in the messages. I recently reworked this to just use SD_BUS_CREDS_AUGMENT instead, so that reading data from /proc is at least unified at one place. This is safe only because we know that the selinux data is not augmented, but available anyway and safely, and we rely on that. You can take the selinux code in system as a hint, that getting metadata about bus requests that one can log about is nothign we made up, it's very real, and our selinux code added that years ago, and was written by the selinux guys, not us... I will grant you though that it is confusing that we use SD_BUS_CREDS_AUGMENT here like this, and implicitly rely on that the selinux label is not a field that is being augmented. We should make this explicit, absolutely. I'll now add some code that will make this assumption explicit and fails early if the selinux label happens to be augmented. Of course in real-life this is impossible to trigger, but it's certainly helps understanding the code. Agreed, and that's not the only vulnerability. But code to do it certainly exists in systemd. Not the only vulnerability? I am not aware of any vulnerability, can you explain? See below and attached. $ cp `which busctl` . $ sudo setcap all=eip busctl $ ./busctl call org.freedesktop.timedate1 /org/freedesktop/timedate1 org.freedesktop.timedate1 SetTimezone sb Europe/London 0 I still don't see why, even if this were bug-free, it's even remotely useful. Keep in mind that basically no one has a capability if they're not root, and dbus users should really use *dbus* mechanisms to retain selected privileges when they drop other privileges. Well, some of systemd's own deamons run with an uid != 0, but some caps left. And avahi (as a daemon I know very well, that is not part of systemd) does that too, since 10 years actually... It's not that uncommon among system daemons really. And does it use those caps for the purpose of authenticating to dbus services? No, it does not. Because it's unsafe to do that on dbus1. As mentioned before, kdbus makes this safe. If so, perhaps you should reconsider allowing open dbus connections to act as capabilities (the real kind, not the Linux capability crap) themselves. No, they do not use capabilities for bus authentication, since they predate kdbus, and acquiring them on dbus1/af_unix is not safe. Sure. Unshare your user namespace, set things up right, and systemd or any other server will see you as having all capabilities. You've fixed that in kdbus, but you haven't (and probably can't!) fix it in the legacy code, and that legacy code is still there (!). (Actually, that code seems to have been actively developed as recently as November 2014.) Hmm? I do not understand what you are referring to. See the attached code. If you LD_PRELOAD it and send a message that causes the receiver to read CapEff, then *boom*. So either: a) all the code to read CapEff in the systemd tree doesn't do anything, or b) it's wrong, or c) it's only used for diagnostics As an illustration, LD_PRELOAD the attachment into sleep 1h, then cat /proc/`pidof sleep`/status. Note the rather large CapEff value :) Again, this has nothing to do with kdbus, and races aren't needed. I don't see how this is a vulnerability. Reading stuff from /proc is unsafe. Hence we do not use it for authentication. We all know that. I am really not sure what you are trying to prove here. Can you please explain how precisely you think that sd-bus or systemd or the way they use capabilities is exploitable in any way? You keep claiming that, but I never have seen more than vague words about
Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY
On Thu, Apr 16, 2015 at 11:23 AM, Lennart Poettering lenn...@poettering.net wrote: On Thu, 16.04.15 10:52, Andy Lutomirski (l...@amacapital.net) wrote: It would be very helpful if you could go into details on why you think more care is needed here than for other things. Is there anything non-trivial going on here that I'm missing? The way capabilites are exposed through kdbus seems perfectly straight-forward to me, so I'm really trying to get my head around your concerns here. So, let me ask explicitly again: * Are there any actual exploitable vulnerabilities you are aware of in the kdbus kernel code? * Are there any actual exploitable vulnerabilities on the sd-bus userspace code you are aware of, regardless if in the kdbus or dbus1 support in it? If you are, please provide us with good explanations, so that we can do something about them. We'd love to fix this! I don't see a vulnerability in the code as such. But you still haven't given a real use case, and vulnerabilities in security mechanisms are often found in the interactions between components. There doesn't seem to be a big picture here. Can you give a plausible real world example along the lines of process A, which has capability B for some decent reason, will send message C to daemon D, and D will permit the request because of capability B, but the request would not otherwise have been permitted. So far it seems like there's a bunch of code that may not be exploitable but doesn't really do anything either. And, honestly, if this is only about rebooting and setting the time, then, even if correct, this is seriously not worth the complexity even if it's completely sensible and correct. Well, you asked for examples, I gave you two simple, existing ones. Note that the caps-based method authorization is not much used currently, especially not outside of systemd, because it's not safely doable on dbus1, but only on kdbus, and kdbus is not merged into the kernel yet, hence not many projects use it yet. But anyway, here are two more example in systemd itself: systemd itself checks CAP_SYS_KILL for clients asking to kill arbitrary services (which means invoking kill() to all PIDs in the service's cgroup). Similar to this, logind checks CAP_SYS_KILL for clients asking to kill sessions (which means invoking kill() for all PIDS in the session). These aren't the kinds of examples I'm asking about. They're what systemd does, not what use case makes doing that serve a purpose.. Now, to put together a more complex scenario for you: consider a small web UI that can be used to set the system time. It should realy run at minimal privileges, after all it has a surface to the web. Hence you write it as daemon, make it run as a user id of its own, set up a chroot() or a file system namespace, but you do keep CAP_SYS_TIME and a bus connection open. Aha, a real example! But this is a bad example -- systemd should provide the web server with a way to retain the right to issue the one method call that it needs. CAP_SYS_TIME is overbroad. For example, CAP_SYS_TIME also grants the service to bypass whatever auditing and logging systemd would do. And this is also a worthless design pattern, as it seems to be applicable to sending signals and changing the time, and probably very little else. With that setup the web daemon can pretty much only set the system clock, and if it exploited cannot be used for much else. It will not have access to /dev/rtc, due to the file system namespace, but it can nicely set the system clock via timedated still, and pretty much only that, and the clock will be synced to the RTC by it. It can set the time with settimeofday(2), which is undesirable. To map this back to your earlier request for an example. In this case process A is the web UI daemon. Capability B is CAP_SYS_TIME. Message C is the SetTime() bus call. Daemon D is timedated. If the web UI daemon would not have CAP_SYS_TIME it couldn't change the time like this -- unless of course that web UI daemon would run as UID 0 all the time, which is certainly much much less desirable, given that it is a network facing daemon. And if you are not happy with this example, there are numerous others like this of course... Honestly, I doubt that. There are a smallish number of capabilities and a much smaller number that make sense in this manner. Here's what I think is going on: if I propose a kernel patch to change the kernel to let my app do such-and-such, and I convince everyone involved that the patch is bug-free, maintainable, cleanly written, secure, etc, that's still not enough justification to merge it, especially if it's an ABI change. I also need to justify why it's a good idea for my app to do whatever I'm enabling and why the kernel should support it in that way. AFAICT this piece of kdbus code serves to enable a rather odd way to write privilege-separated services
[systemd-devel] SD_BUS_VTABLE_CAPABILITY
Hi all- Yesterday, I discovered SD_BUS_VTABLE_CAPABILITY. Are there any examples in which it does anything? If so, I don't suppose any of you could give me an example of: $ cp `which dbus-send` . $ sudo setcap all=eip dbus-send $ dbus-send [not sure what goes here] that passes an authentication test that would have failed without the setcap? In the interest of full disclosure, I'm asking because I think that one of two things is true: 1. The SD_BUS_VTABLE_CAPABILITY code is useless and should therefore be deleted. 2. The SD_BUS_VTABLE_CAPABILITY code is exploitably buggy and should therefore be deleted. I can't tell which one, since I haven't figured out how to test it realistically in the first place. Most of the protected calls seem to be heavily restricted by dbus policy. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-vconsole-setup fails very slowly
On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski l...@amacapital.net wrote: On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering lenn...@poettering.net wrote: On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote: Hi all- When running virtme (a simple vm gadget) on Fedora 21, the slowest part of bootup by far appears to be systemd-vconsole-setup: # time /usr/lib/systemd/systemd-vconsole-setup putfont: PIO_FONT trying ... ... setfont: putfont: 512,8x16: failed: -1 putfont: PIO_FONT: Invalid argument /usr/bin/setfont failed with error code 71. setfont is not part of systemd, we just invoke it. If that fails, this is a problem somewhere between the VM, the kernel and console-tools. Aha -- I missed that systemd-vconsole-setup calls setfont. I can trigger the same problem by just typing setfont. For whatever reason, my other Fedora 21 computer only has this problem if I type setfont and not if I run systemd-vconcole-setup. My uneducated guess is that your virtual machine boots up with a non-graphical console, and the tool thus tries to upload the fonts into the good old VGA hw text mode glyph tables, and qemu is very slow at that... Or something like that. setfont is doing this: nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 This thing has only a serial console: # cat /proc/consoles ttyS0-W- (EC a)4:64 setfont does this: /* we allow ourselves to hang here for ca 5 seconds, xdm may be playing tricks on us. */ while ((loop++ 20) (i = ioctl(fd, PIO_FONT, buf))) { if (loop = 1) fprintf(stderr, putfont: PIO_FONT trying ...\n); else fprintf(stderr, .); usleep(25); } fprintf(stderr, \n); Alexey, would it make sense to remove this loop or to add a way to turn it off? Ping, everyone? This issue still exists. AFAICT systemd is relying on a really old tool, that that really old tool (setfont) is sometimes delaying boot by a very large amount. Can we either fix the tool (Alexey) or stop using it (systemd people)? --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-vconsole-setup fails very slowly
On Wed, Apr 1, 2015 at 12:32 PM, Kay Sievers k...@vrfy.org wrote: On Wed, Apr 1, 2015 at 8:56 PM, Andy Lutomirski l...@amacapital.net wrote: On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski l...@amacapital.net wrote: On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering lenn...@poettering.net wrote: On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote: Hi all- When running virtme (a simple vm gadget) on Fedora 21, the slowest part of bootup by far appears to be systemd-vconsole-setup: # time /usr/lib/systemd/systemd-vconsole-setup putfont: PIO_FONT trying ... ... setfont: putfont: 512,8x16: failed: -1 putfont: PIO_FONT: Invalid argument /usr/bin/setfont failed with error code 71. setfont is not part of systemd, we just invoke it. If that fails, this is a problem somewhere between the VM, the kernel and console-tools. Aha -- I missed that systemd-vconsole-setup calls setfont. I can trigger the same problem by just typing setfont. For whatever reason, my other Fedora 21 computer only has this problem if I type setfont and not if I run systemd-vconcole-setup. My uneducated guess is that your virtual machine boots up with a non-graphical console, and the tool thus tries to upload the fonts into the good old VGA hw text mode glyph tables, and qemu is very slow at that... Or something like that. setfont is doing this: nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 This thing has only a serial console: # cat /proc/consoles ttyS0-W- (EC a)4:64 setfont does this: /* we allow ourselves to hang here for ca 5 seconds, xdm may be playing tricks on us. */ while ((loop++ 20) (i = ioctl(fd, PIO_FONT, buf))) { if (loop = 1) fprintf(stderr, putfont: PIO_FONT trying ...\n); else fprintf(stderr, .); usleep(25); } fprintf(stderr, \n); Alexey, would it make sense to remove this loop or to add a way to turn it off? Ping, everyone? This issue still exists. AFAICT systemd is relying on a really old tool, that that really old tool (setfont) is sometimes delaying boot by a very large amount. Can we either fix the tool (Alexey) or stop using it (systemd people)? Hmm, why is the vm gadget you run configuring a custom console font at all? If there is no custom font specified in t he config, systemd will not run setfont. It's not intentionally configuring a custom font, but it might be inheriting Fedora's settings. Or did you mean to have vconsole-setup detect that it should not even try to run setfont? Not sure how to find that out. I don't really see how vconsole-setup could get rid of calling setfont from systemd, it is needed in many setups. vconsole-setup could set the font itself instead of using setfont if setfont can't be configured or fixed not to keep retrying for five seconds (!). Ideally, I think that setfont would just stop retrying on failure. Or perhaps all of this could go through udev or some other mechanism that doesn't try to set the font until the device actually exists. But the console system is weird and may be that's hard. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-vconsole-setup fails very slowly
On Wed, Apr 1, 2015 at 2:36 PM, Kay Sievers k...@vrfy.org wrote: On Wed, Apr 1, 2015 at 11:19 PM, Andy Lutomirski l...@amacapital.net wrote: On Wed, Apr 1, 2015 at 1:53 PM, Kay Sievers k...@vrfy.org wrote: On Wed, Apr 1, 2015 at 10:45 PM, Andy Lutomirski l...@amacapital.net wrote: On Apr 1, 2015 12:56 PM, Kay Sievers k...@vrfy.org wrote: Do you have an idea why the VM does not accept the custom font? If that is something obvious, and we can detect it, we could make vconsole-setup check for it. But then again, fixing setfont seems like the obvious fix here. I assume it's because the VM has no graphical console at all. We check the existence of the corresponding /dev/vcs%i, to check if the tty is allocated where we want to apply the font to. Do these devices exist on the running machine? Yes: # ls /dev/vcs* /dev/vcs /dev/vcs2 /dev/vcs4 /dev/vcsa1 /dev/vcsa3 /dev/vcs1 /dev/vcs3 /dev/vcsa /dev/vcsa2 /dev/vcsa4 Looking at the code, the vc_screen.c code seems to create those devices unconditionally. They should only get created when something accesses the corresponding tty. deallocvt(1) can kill unused ones and the device nodes should disappear. deallocvt doesn't seem to kill those device nodes for me. And what does this say? grep . /sys/class/tty/tty0/active /sys/class/tty/console/active # grep . /sys/class/tty/tty0/active /sys/class/tty/console/active /sys/class/tty/tty0/active:tty1 /sys/class/tty/console/active:ttyS0 vcs1 has, roughly: early console in decompress_kernel Decompressing Linux... Parsing ELF... done. Booting the kernel. Now I'm wondering how that buffer came to be. In any event, some tracing of the code suggests that I have vga_video_type == VIDEO_TYPE_CGA, and that fails if (vga_video_type VIDEO_TYPE_EGAM) in vgacon_font_set. Indeed, /proc/ioports has: 03d4-03d5 : cga and dmesg says: [0.00] Console: colour *CGA 80x25 I don't see this information in sysfs anywhere. Perhaps checking for an active console and detecting -EINVAL from vgacon_font_get would work. Hmm, yeah, maybe we could try one of the font-related ioctls to find out if the driver supports that before we spawn setfont. /proc/fb is empty on this VM, so maybe that would help. Grr, this stuff is really old and crufty. The offending qemu command line args appear to be -vga none -display none. I assume I have CGA because it's the fallback case in vgacon.c if nothing matches. Hehe, blast from the past. :) If you give kvm a VGA device, it all works fine? I just tried it. setfont succeeds, and the VGA device matches /dev/vcs's contents. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-vconsole-setup fails very slowly
On Wed, Apr 1, 2015 at 2:47 PM, Kay Sievers k...@vrfy.org wrote: On Wed, Apr 1, 2015 at 11:38 PM, Andy Lutomirski l...@amacapital.net wrote: On Wed, Apr 1, 2015 at 2:36 PM, Kay Sievers k...@vrfy.org wrote: They should only get created when something accesses the corresponding tty. deallocvt(1) can kill unused ones and the device nodes should disappear. deallocvt doesn't seem to kill those device nodes for me. Seems to work here: # ls -l /dev/vcs[6789] crw-rw 1 root tty 7, 6 Apr 1 22:21 /dev/vcs6 # cat /dev/tty7 ^C # cat /dev/tty9 ^C # ls -l /dev/vcs[6789] crw-rw 1 root tty 7, 6 Apr 1 22:21 /dev/vcs6 crw-rw 1 root tty 7, 7 Apr 1 23:42 /dev/vcs7 crw-rw 1 root tty 7, 9 Apr 1 23:42 /dev/vcs9 # deallocvt 7 # ls -l /dev/vcs[6789] crw-rw 1 root tty 7, 6 Apr 1 22:21 /dev/vcs6 crw-rw 1 root tty 7, 9 Apr 1 23:42 /dev/vcs9 # deallocvt 9 # ls -l /dev/vcs[6789] crw-rw 1 root tty 7, 6 Apr 1 22:21 /dev/vcs6 Aha. It seems that I have something holding tty1-tty4 open. I'll fix that on my end. Will that make vconsole-setup stop calling setfont? If so, that will indirectly solve my problem. (Although... I don't see why the presence or absence of in-use vts should affect font loading. Also, it seems like vcs1 shows up no matter what I do.) The offending qemu command line args appear to be -vga none -display none. I assume I have CGA because it's the fallback case in vgacon.c if nothing matches. Hehe, blast from the past. :) If you give kvm a VGA device, it all works fine? I just tried it. setfont succeeds, and the VGA device matches /dev/vcs's contents. Ah, nice. If we figure out some dummy font-related call to check if the kernel supports font handling at all, we could just add that to vconsole-setup, I guess. Kay -- Andy Lutomirski AMA Capital Management, LLC ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-vconsole-setup fails very slowly
On Wed, Apr 1, 2015 at 1:53 PM, Kay Sievers k...@vrfy.org wrote: On Wed, Apr 1, 2015 at 10:45 PM, Andy Lutomirski l...@amacapital.net wrote: On Apr 1, 2015 12:56 PM, Kay Sievers k...@vrfy.org wrote: Do you have an idea why the VM does not accept the custom font? If that is something obvious, and we can detect it, we could make vconsole-setup check for it. But then again, fixing setfont seems like the obvious fix here. I assume it's because the VM has no graphical console at all. We check the existence of the corresponding /dev/vcs%i, to check if the tty is allocated where we want to apply the font to. Do these devices exist on the running machine? Yes: # ls /dev/vcs* /dev/vcs /dev/vcs2 /dev/vcs4 /dev/vcsa1 /dev/vcsa3 /dev/vcs1 /dev/vcs3 /dev/vcsa /dev/vcsa2 /dev/vcsa4 Looking at the code, the vc_screen.c code seems to create those devices unconditionally. And what does this say? grep . /sys/class/tty/tty0/active /sys/class/tty/console/active # grep . /sys/class/tty/tty0/active /sys/class/tty/console/active /sys/class/tty/tty0/active:tty1 /sys/class/tty/console/active:ttyS0 vcs1 has, roughly: early console in decompress_kernel Decompressing Linux... Parsing ELF... done. Booting the kernel. Now I'm wondering how that buffer came to be. In any event, some tracing of the code suggests that I have vga_video_type == VIDEO_TYPE_CGA, and that fails if (vga_video_type VIDEO_TYPE_EGAM) in vgacon_font_set. Indeed, /proc/ioports has: 03d4-03d5 : cga and dmesg says: [0.00] Console: colour *CGA 80x25 I don't see this information in sysfs anywhere. Perhaps checking for an active console and detecting -EINVAL from vgacon_font_get would work. /proc/fb is empty on this VM, so maybe that would help. Grr, this stuff is really old and crufty. The offending qemu command line args appear to be -vga none -display none. I assume I have CGA because it's the fallback case in vgacon.c if nothing matches. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-vconsole-setup fails very slowly
On Apr 1, 2015 12:56 PM, Kay Sievers k...@vrfy.org wrote: On Wed, Apr 1, 2015 at 9:36 PM, Andy Lutomirski l...@amacapital.net wrote: On Wed, Apr 1, 2015 at 12:32 PM, Kay Sievers k...@vrfy.org wrote: On Wed, Apr 1, 2015 at 8:56 PM, Andy Lutomirski l...@amacapital.net wrote: On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski l...@amacapital.net wrote: On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering lenn...@poettering.net wrote: On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote: Hi all- When running virtme (a simple vm gadget) on Fedora 21, the slowest part of bootup by far appears to be systemd-vconsole-setup: # time /usr/lib/systemd/systemd-vconsole-setup putfont: PIO_FONT trying ... ... setfont: putfont: 512,8x16: failed: -1 putfont: PIO_FONT: Invalid argument /usr/bin/setfont failed with error code 71. setfont is not part of systemd, we just invoke it. If that fails, this is a problem somewhere between the VM, the kernel and console-tools. Aha -- I missed that systemd-vconsole-setup calls setfont. I can trigger the same problem by just typing setfont. For whatever reason, my other Fedora 21 computer only has this problem if I type setfont and not if I run systemd-vconcole-setup. My uneducated guess is that your virtual machine boots up with a non-graphical console, and the tool thus tries to upload the fonts into the good old VGA hw text mode glyph tables, and qemu is very slow at that... Or something like that. setfont is doing this: nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 This thing has only a serial console: # cat /proc/consoles ttyS0-W- (EC a)4:64 setfont does this: /* we allow ourselves to hang here for ca 5 seconds, xdm may be playing tricks on us. */ while ((loop++ 20) (i = ioctl(fd, PIO_FONT, buf))) { if (loop = 1) fprintf(stderr, putfont: PIO_FONT trying ...\n); else fprintf(stderr, .); usleep(25); } fprintf(stderr, \n); Alexey, would it make sense to remove this loop or to add a way to turn it off? Ping, everyone? This issue still exists. AFAICT systemd is relying on a really old tool, that that really old tool (setfont) is sometimes delaying boot by a very large amount. Can we either fix the tool (Alexey) or stop using it (systemd people)? Hmm, why is the vm gadget you run configuring a custom console font at all? If there is no custom font specified in t he config, systemd will not run setfont. It's not intentionally configuring a custom font, but it might be inheriting Fedora's settings. Ideally, /etc/vconsole.conf does not even exist in a default setup. It is only needed for foreign language keyboard support or more exotic font requirements. Or did you mean to have vconsole-setup detect that it should not even try to run setfont? Not sure how to find that out. I don't really see how vconsole-setup could get rid of calling setfont from systemd, it is needed in many setups. vconsole-setup could set the font itself instead of using setfont if setfont can't be configured or fixed not to keep retrying for five seconds (!). It is a rather complex logic which would need to be duplicated in systemd. We so far have avoided it, because the kernel VC font and keymaps are so conceptually limited, that it does not really make sense to build a modern system on top of it. If systemd gets advanvced console support with systemd-consoled, we need full unicode support, high-dpi display support, display hotplug, ..., all things the kernel's vc stuff will never give us. That is why we rely on setfont and loadkeys for now. Ideally, I think that setfont would just stop retrying on failure. Right, that sounds like a simple and sensible fix. Too bad no one from the kbd list has replied :( Or perhaps all of this could go through udev or some other mechanism that doesn't try to set the font until the device actually exists. But the console system is weird and may be that's hard
Re: [systemd-devel] systemd-vconsole-setup fails very slowly
On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering lenn...@poettering.net wrote: On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote: Hi all- When running virtme (a simple vm gadget) on Fedora 21, the slowest part of bootup by far appears to be systemd-vconsole-setup: # time /usr/lib/systemd/systemd-vconsole-setup putfont: PIO_FONT trying ... ... setfont: putfont: 512,8x16: failed: -1 putfont: PIO_FONT: Invalid argument /usr/bin/setfont failed with error code 71. setfont is not part of systemd, we just invoke it. If that fails, this is a problem somewhere between the VM, the kernel and console-tools. Aha -- I missed that systemd-vconsole-setup calls setfont. I can trigger the same problem by just typing setfont. For whatever reason, my other Fedora 21 computer only has this problem if I type setfont and not if I run systemd-vconcole-setup. My uneducated guess is that your virtual machine boots up with a non-graphical console, and the tool thus tries to upload the fonts into the good old VGA hw text mode glyph tables, and qemu is very slow at that... Or something like that. setfont is doing this: nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 nanosleep({0, 25000}, NULL) = 0 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument) write(2, ., 1.)= 1 This thing has only a serial console: # cat /proc/consoles ttyS0-W- (EC a)4:64 setfont does this: /* we allow ourselves to hang here for ca 5 seconds, xdm may be playing tricks on us. */ while ((loop++ 20) (i = ioctl(fd, PIO_FONT, buf))) { if (loop = 1) fprintf(stderr, putfont: PIO_FONT trying ...\n); else fprintf(stderr, .); usleep(25); } fprintf(stderr, \n); Alexey, would it make sense to remove this loop or to add a way to turn it off? Thanks, Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd-vconsole-setup fails very slowly
Hi all- When running virtme (a simple vm gadget) on Fedora 21, the slowest part of bootup by far appears to be systemd-vconsole-setup: # time /usr/lib/systemd/systemd-vconsole-setup putfont: PIO_FONT trying ... ... setfont: putfont: 512,8x16: failed: -1 putfont: PIO_FONT: Invalid argument /usr/bin/setfont failed with error code 71. real0m5.361s user0m0.079s sys0m0.093s I have no idea what it's trying to do, why it thinks it should work, or why it doesn't work, but can it be updated to fail quickly. The VM in question has no graphics: # lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem 00:03.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer To reproduce on Fedora 21: $ sudo dnf install virtme $ virtme-run --installed-kernel Then twiddle your thumbs for ~6.3 seconds instead of the ~1.3 seconds of thumb twiddling that would be required AFAICT without this issue. Thanks, Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Tue, Dec 9, 2014 at 12:46 PM, Andy Lutomirski l...@amacapital.net wrote: On Mon, Nov 3, 2014 at 12:41 PM, Andy Lutomirski l...@amacapital.net wrote: On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina jkos...@suse.cz wrote: On Mon, 3 Nov 2014, David Herrmann wrote: Agreed, mostly. My only real concern is that this could be annoying for the userspace developers who will need to target Linux and HIDAPI separately. Admittedly the Linux support will be trivial. I see. I'll not stop you from using hidraw, I'd just not recommend it. Especially for security stuff I dislike exposing the whole HID device writable. But yeah, I guess you got my point. Alright, I am basically thinking loudly now, but how about we allow HID drivers that would override default hidraw interface? I am very well aware of the fact that this could be opening a can of worms, so we'll have to make it very restrictive. How about, let's say, we allow HID drivers to provide custom hidraw interface (completely overriding the one that HID core would normally create) only for cases such as: - the intent is to expose only certain parts of a combined device - the intent is to introduce some level of access control I.e. still no interference of kernel with data parsing allowed. Hmm. Would this be like a filter on hidraw actions? How would udev distinguish these special hidraw devices from normal hidraw devices? Also, for U2F, this could be a little awkward. There's some crazy fragmentation stuff to allow a U2F request to be split across HID requests, and I think a kernel driver would much rather get the original unfragmented application request. I started writing a driver for this. I got enumeration working. I assume I should create a u2f device class, and then... ? Where am I supposed to get my character device from? Do I register my own chrdev major? Do I use misc? Is there some input thing I'm supposed to use? Another question: I'm hitting this in hid_input_report: if (down_trylock(hid-driver_input_lock)) { pr_err(HID: trylock failed\n); // I added this return -EBUSY; } This is a problem for u2f: u2f reports are actual protocol messages, and there isn't a retransmit mechanism. Losing messages randomly causes the handshake to fail, and then nothing works. I *think* that the only way I can hit that failure is during probe (or if the USB stack somehow completes two transfers at once). This still makes it quite awkward to start IO from the probe routine. I could start a work item to take driver_input_lock, release it again, and then start the handshake, but that sucks. Ideas? --Andy Thanks, Andy --Andy I can give the driver a try. It'll actually be simpler than the spec makes it out, since a real kernel driver will have no need to arbitrate with itself. I'll gladly review any custom HID drivers. Feel free to put me on CC when sending to linux-input. There's also a lot of examples in drivers/hid/ in case you need a head-start (especially if you need the raw_event callback). Same here, of course. Please always CC me in parallel to sending to linux-input@ to make sure that the patch doesn't fall in between cracks. Thanks, -- Jiri Kosina SUSE Labs -- Andy Lutomirski AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Mon, Nov 3, 2014 at 12:41 PM, Andy Lutomirski l...@amacapital.net wrote: On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina jkos...@suse.cz wrote: On Mon, 3 Nov 2014, David Herrmann wrote: Agreed, mostly. My only real concern is that this could be annoying for the userspace developers who will need to target Linux and HIDAPI separately. Admittedly the Linux support will be trivial. I see. I'll not stop you from using hidraw, I'd just not recommend it. Especially for security stuff I dislike exposing the whole HID device writable. But yeah, I guess you got my point. Alright, I am basically thinking loudly now, but how about we allow HID drivers that would override default hidraw interface? I am very well aware of the fact that this could be opening a can of worms, so we'll have to make it very restrictive. How about, let's say, we allow HID drivers to provide custom hidraw interface (completely overriding the one that HID core would normally create) only for cases such as: - the intent is to expose only certain parts of a combined device - the intent is to introduce some level of access control I.e. still no interference of kernel with data parsing allowed. Hmm. Would this be like a filter on hidraw actions? How would udev distinguish these special hidraw devices from normal hidraw devices? Also, for U2F, this could be a little awkward. There's some crazy fragmentation stuff to allow a U2F request to be split across HID requests, and I think a kernel driver would much rather get the original unfragmented application request. I started writing a driver for this. I got enumeration working. I assume I should create a u2f device class, and then... ? Where am I supposed to get my character device from? Do I register my own chrdev major? Do I use misc? Is there some input thing I'm supposed to use? Thanks, Andy --Andy I can give the driver a try. It'll actually be simpler than the spec makes it out, since a real kernel driver will have no need to arbitrate with itself. I'll gladly review any custom HID drivers. Feel free to put me on CC when sending to linux-input. There's also a lot of examples in drivers/hid/ in case you need a head-start (especially if you need the raw_event callback). Same here, of course. Please always CC me in parallel to sending to linux-input@ to make sure that the patch doesn't fall in between cracks. Thanks, -- Jiri Kosina SUSE Labs -- Andy Lutomirski AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid
On Sun, Nov 30, 2014 at 11:03 PM, Konstantin Khlebnikov koc...@gmail.com wrote: Hmm. What about per-task/thread UUID? exported via separate file: /proc/PID/uuid It could be created at the first access, thus this wouldn't shlowdown clone(). Also it could be droped at execve(), so it'll describe execution context more precisely than pid. I'd be all for this if it weren't for two issues: 1. This thing needs to work as soon as init is started, and we can't reliably generate random numbers that early. 2. Migration / CRIU is rather fundamentally at odds with making anything universally unique, as opposed to just per-user-ns. So, given that I don't think we can actually provide a universally unique pid-like thing, I'd rather not even try. That being said, I want to rework this a little bit. I think that highpid should be restricted to the range 2^32 through 2^64 - 4096. The former prevents anyone from confusing highpid with regular pid, and the latter means that we don't need to worry about confusion between errors and valid highpids (e.g. -1 will never be a highpid). Implementing that will be only mildly annoying. --Andy On Sat, Nov 29, 2014 at 2:05 AM, Andy Lutomirski l...@amacapital.net wrote: Pid reuse is common, which means that it's difficult or impossible to read information about a pid from /proc without races. This introduces a second number associated with each (task, pidns) pair called highpid. Highpid is a 64-bit number, and, barring extremely unlikely circumstances or outright error, a (highpid, pid) will never be reused. With just this change, a program can open /proc/PID/status, read the Highpid field, and confirm that it has the expected value. If the pid has been reused, then highpid will be different. The initial implementation is straightforward: highpid is simply a 64-bit counter. If a high-end system can fork every 3 ns (which would be amazing, given that just allocating a pid requires at atomic operation), it would take well over 1000 years for highpid to wrap. For CRIU's benefit, the next highpid can be set by a privileged user. NB: The sysctl stuff only works on 64-bit systems. If the approach looks good, I'll fix that somehow. Signed-off-by: Andy Lutomirski l...@amacapital.net --- If this goes in, there's plenty of room to add new interfaces to make this more useful. For example, we could add a fancier tgkill that adds and validates hightgid and highpid, and we might want to add a syscall to read one's own hightgid and highpid. These would be quite useful for pidfiles. David, would this be useful for kdbus? CRIU people: will this be unduly difficult to support in CRIU? If you all think this is a good idea, I'll fix the sysctl stuff and document it. It might also be worth adding Hightgid to status. fs/proc/array.c | 5 - include/linux/pid.h | 2 ++ include/linux/pid_namespace.h | 1 + kernel/pid.c | 19 +++ kernel/pid_namespace.c| 22 ++ 5 files changed, 44 insertions(+), 5 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index cd3653e4f35c..f1e0e69d19f9 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, int g; struct fdtable *fdt = NULL; const struct cred *cred; + const struct upid *upid; pid_t ppid, tpid; rcu_read_lock(); @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, if (tracer) tpid = task_pid_nr_ns(tracer, ns); } + upid = pid_upid_ns(pid, ns); cred = get_task_cred(p); seq_printf(m, State:\t%s\n Tgid:\t%d\n Ngid:\t%d\n Pid:\t%d\n + Highpid:\t%llu\n PPid:\t%d\n TracerPid:\t%d\n Uid:\t%d\t%d\t%d\t%d\n @@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, get_task_state(p), task_tgid_nr_ns(p, ns), task_numa_group_id(p), - pid_nr_ns(pid, ns), + upid ? upid-nr : 0, upid ? upid-highnr : 0, ppid, tpid, from_kuid_munged(user_ns, cred-uid), from_kuid_munged(user_ns, cred-euid), diff --git a/include/linux/pid.h b/include/linux/pid.h index 23705a53abba..ece70b64d04c 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -51,6 +51,7 @@ struct upid { /* Try to keep pid_chain in the same cacheline as nr for find_vpid */ int nr; struct pid_namespace *ns; + u64 highnr; struct hlist_node pid_chain; }; @@ -170,6 +171,7 @@ static inline pid_t pid_nr(struct pid *pid) } pid_t pid_nr_ns
Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid
On Mon, Dec 1, 2014 at 8:39 AM, Konstantin Khlebnikov koc...@gmail.com wrote: On Mon, Dec 1, 2014 at 7:21 PM, Andy Lutomirski l...@amacapital.net wrote: On Sun, Nov 30, 2014 at 11:03 PM, Konstantin Khlebnikov koc...@gmail.com wrote: Hmm. What about per-task/thread UUID? exported via separate file: /proc/PID/uuid It could be created at the first access, thus this wouldn't shlowdown clone(). Also it could be droped at execve(), so it'll describe execution context more precisely than pid. I'd be all for this if it weren't for two issues: 1. This thing needs to work as soon as init is started, and we can't reliably generate random numbers that early. 2. Migration / CRIU is rather fundamentally at odds with making anything universally unique, as opposed to just per-user-ns. So, given that I don't think we can actually provide a universally unique pid-like thing, I'd rather not even try. Well... another idea: what about pid generation counter? Of course it should be per-pid-ns. As I see struct upid has nice 32-bit hole next to 'nr' field. (on 64-bit systems) I thought about that, but it has two issues: 1. Implementing it will be pain. The pid allocation algorithm is already complicated, and it wasn't obvious to me how to accurately detect wraparound without racing against other wrapping users. 2. I don't think it will be reliable. Suppose that there are pid_max - 1 processes. One of them can repeatedly clone and exit, incrementing the generation counter each time. After 2^32 iterations, which won't take all that long, the pid generation will wrap. --Andy That being said, I want to rework this a little bit. I think that highpid should be restricted to the range 2^32 through 2^64 - 4096. The former prevents anyone from confusing highpid with regular pid, and the latter means that we don't need to worry about confusion between errors and valid highpids (e.g. -1 will never be a highpid). Implementing that will be only mildly annoying. --Andy On Sat, Nov 29, 2014 at 2:05 AM, Andy Lutomirski l...@amacapital.net wrote: Pid reuse is common, which means that it's difficult or impossible to read information about a pid from /proc without races. This introduces a second number associated with each (task, pidns) pair called highpid. Highpid is a 64-bit number, and, barring extremely unlikely circumstances or outright error, a (highpid, pid) will never be reused. With just this change, a program can open /proc/PID/status, read the Highpid field, and confirm that it has the expected value. If the pid has been reused, then highpid will be different. The initial implementation is straightforward: highpid is simply a 64-bit counter. If a high-end system can fork every 3 ns (which would be amazing, given that just allocating a pid requires at atomic operation), it would take well over 1000 years for highpid to wrap. For CRIU's benefit, the next highpid can be set by a privileged user. NB: The sysctl stuff only works on 64-bit systems. If the approach looks good, I'll fix that somehow. Signed-off-by: Andy Lutomirski l...@amacapital.net --- If this goes in, there's plenty of room to add new interfaces to make this more useful. For example, we could add a fancier tgkill that adds and validates hightgid and highpid, and we might want to add a syscall to read one's own hightgid and highpid. These would be quite useful for pidfiles. David, would this be useful for kdbus? CRIU people: will this be unduly difficult to support in CRIU? If you all think this is a good idea, I'll fix the sysctl stuff and document it. It might also be worth adding Hightgid to status. fs/proc/array.c | 5 - include/linux/pid.h | 2 ++ include/linux/pid_namespace.h | 1 + kernel/pid.c | 19 +++ kernel/pid_namespace.c| 22 ++ 5 files changed, 44 insertions(+), 5 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index cd3653e4f35c..f1e0e69d19f9 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, int g; struct fdtable *fdt = NULL; const struct cred *cred; + const struct upid *upid; pid_t ppid, tpid; rcu_read_lock(); @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, if (tracer) tpid = task_pid_nr_ns(tracer, ns); } + upid = pid_upid_ns(pid, ns); cred = get_task_cred(p); seq_printf(m, State:\t%s\n Tgid:\t%d\n Ngid:\t%d\n Pid:\t%d\n + Highpid:\t%llu\n PPid:\t%d\n TracerPid:\t%d\n Uid:\t%d\t%d\t%d\t%d\n @@ -183,7 +186,7 @@ static inline void task_state
Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid
On Nov 30, 2014 9:45 AM, David Herrmann dh.herrm...@gmail.com wrote: Hi Andy On Sat, Nov 29, 2014 at 12:05 AM, Andy Lutomirski l...@amacapital.net wrote: Pid reuse is common, which means that it's difficult or impossible to read information about a pid from /proc without races. This introduces a second number associated with each (task, pidns) pair called highpid. Highpid is a 64-bit number, and, barring extremely unlikely circumstances or outright error, a (highpid, pid) will never be reused. With just this change, a program can open /proc/PID/status, read the Highpid field, and confirm that it has the expected value. If the pid has been reused, then highpid will be different. The initial implementation is straightforward: highpid is simply a 64-bit counter. If a high-end system can fork every 3 ns (which would be amazing, given that just allocating a pid requires at atomic operation), it would take well over 1000 years for highpid to wrap. For CRIU's benefit, the next highpid can be set by a privileged user. NB: The sysctl stuff only works on 64-bit systems. If the approach looks good, I'll fix that somehow. Signed-off-by: Andy Lutomirski l...@amacapital.net --- If this goes in, there's plenty of room to add new interfaces to make this more useful. For example, we could add a fancier tgkill that adds and validates hightgid and highpid, and we might want to add a syscall to read one's own hightgid and highpid. These would be quite useful for pidfiles. David, would this be useful for kdbus? Much appreciated! This would serve well as replacement for 'starttime'. I'd prefer if pid_t was 64bit, but I guess that ship sailed long ago. Though, your patch might in the end just introduce a new pid64, which replaces the old pid and lives in parallel. Anyway, considering that we actually want the same pid-reuse protection for tid, tgid, ppid and so on, we'd have to add a 'starttime' for all of them. Sounds ugly.. so we might just end up dropping 'starttime' and introduce KDBUS_ITEM_PIDS2 whenever a 'fix' is merged upstream. I don't understand we want. Doesn't kdbus currently only think about tid and tgid? In any event, I just looked at the kdbus code (https://github.com/gregkh/kdbus/blob/master/metadata.c), and I see: meta-pid = get_pid(task_pid(current)); meta-tgid = get_pid(task_tgid(current)); meta-starttime = current-start_time, I don't understand how that concept of starttime is particularly useful. Isn't that the start time associated with tid? How can that be useful when looking up tgid? Regardless, my patch here associates a highpid with each struct pid, and both tid and tgid are just struct pids, so you get both. I agree that adding Highppid to /proc might be useful, but that seems like a future extension that has nothing to do with kdbus. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid
On Nov 30, 2014 1:47 AM, Florian Weimer f...@deneb.enyo.de wrote: * Andy Lutomirski: The initial implementation is straightforward: highpid is simply a 64-bit counter. If a high-end system can fork every 3 ns (which would be amazing, given that just allocating a pid requires at atomic operation), it would take well over 1000 years for highpid to wrap. I'm not sure if I'm reading the patch correctly, but is the counter namespaced? If yes, why? It's namespaced so that CRIU can migrate/restore a whole pid namespace. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid
On Nov 28, 2014 9:24 PM, Greg KH g...@kroah.com wrote: On Fri, Nov 28, 2014 at 03:05:01PM -0800, Andy Lutomirski wrote: Pid reuse is common, which means that it's difficult or impossible to read information about a pid from /proc without races. This introduces a second number associated with each (task, pidns) pair called highpid. Highpid is a 64-bit number, and, barring extremely unlikely circumstances or outright error, a (highpid, pid) will never be reused. With just this change, a program can open /proc/PID/status, read the Highpid field, and confirm that it has the expected value. If the pid has been reused, then highpid will be different. The initial implementation is straightforward: highpid is simply a 64-bit counter. If a high-end system can fork every 3 ns (which would be amazing, given that just allocating a pid requires at atomic operation), it would take well over 1000 years for highpid to wrap. For CRIU's benefit, the next highpid can be set by a privileged user. NB: The sysctl stuff only works on 64-bit systems. If the approach looks good, I'll fix that somehow. Signed-off-by: Andy Lutomirski l...@amacapital.net --- If this goes in, there's plenty of room to add new interfaces to make this more useful. For example, we could add a fancier tgkill that adds and validates hightgid and highpid, and we might want to add a syscall to read one's own hightgid and highpid. These would be quite useful for pidfiles. David, would this be useful for kdbus? CRIU people: will this be unduly difficult to support in CRIU? If you all think this is a good idea, I'll fix the sysctl stuff and document it. It might also be worth adding Hightgid to status. fs/proc/array.c | 5 - include/linux/pid.h | 2 ++ include/linux/pid_namespace.h | 1 + kernel/pid.c | 19 +++ kernel/pid_namespace.c| 22 ++ 5 files changed, 44 insertions(+), 5 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index cd3653e4f35c..f1e0e69d19f9 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, int g; struct fdtable *fdt = NULL; const struct cred *cred; + const struct upid *upid; pid_t ppid, tpid; rcu_read_lock(); @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, if (tracer) tpid = task_pid_nr_ns(tracer, ns); } + upid = pid_upid_ns(pid, ns); cred = get_task_cred(p); seq_printf(m, State:\t%s\n Tgid:\t%d\n Ngid:\t%d\n Pid:\t%d\n + Highpid:\t%llu\n PPid:\t%d\n TracerPid:\t%d\n Uid:\t%d\t%d\t%d\t%d\n Changing existing proc files like this is dangerous and can cause lots of breakage in userspace programs if you are not careful. Usually adding fields to the end of the file is best, but sometimes even then things can break :( Point taken. I had hoped that everything would parse /proc/PID/status by looking for the lamed headers. I'll see what the history of new fields in there is, but I can change this easily enough. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-api in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [RFC PATCH] proc, pidns: Add highpid
Pid reuse is common, which means that it's difficult or impossible to read information about a pid from /proc without races. This introduces a second number associated with each (task, pidns) pair called highpid. Highpid is a 64-bit number, and, barring extremely unlikely circumstances or outright error, a (highpid, pid) will never be reused. With just this change, a program can open /proc/PID/status, read the Highpid field, and confirm that it has the expected value. If the pid has been reused, then highpid will be different. The initial implementation is straightforward: highpid is simply a 64-bit counter. If a high-end system can fork every 3 ns (which would be amazing, given that just allocating a pid requires at atomic operation), it would take well over 1000 years for highpid to wrap. For CRIU's benefit, the next highpid can be set by a privileged user. NB: The sysctl stuff only works on 64-bit systems. If the approach looks good, I'll fix that somehow. Signed-off-by: Andy Lutomirski l...@amacapital.net --- If this goes in, there's plenty of room to add new interfaces to make this more useful. For example, we could add a fancier tgkill that adds and validates hightgid and highpid, and we might want to add a syscall to read one's own hightgid and highpid. These would be quite useful for pidfiles. David, would this be useful for kdbus? CRIU people: will this be unduly difficult to support in CRIU? If you all think this is a good idea, I'll fix the sysctl stuff and document it. It might also be worth adding Hightgid to status. fs/proc/array.c | 5 - include/linux/pid.h | 2 ++ include/linux/pid_namespace.h | 1 + kernel/pid.c | 19 +++ kernel/pid_namespace.c| 22 ++ 5 files changed, 44 insertions(+), 5 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index cd3653e4f35c..f1e0e69d19f9 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, int g; struct fdtable *fdt = NULL; const struct cred *cred; + const struct upid *upid; pid_t ppid, tpid; rcu_read_lock(); @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, if (tracer) tpid = task_pid_nr_ns(tracer, ns); } + upid = pid_upid_ns(pid, ns); cred = get_task_cred(p); seq_printf(m, State:\t%s\n Tgid:\t%d\n Ngid:\t%d\n Pid:\t%d\n + Highpid:\t%llu\n PPid:\t%d\n TracerPid:\t%d\n Uid:\t%d\t%d\t%d\t%d\n @@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, get_task_state(p), task_tgid_nr_ns(p, ns), task_numa_group_id(p), - pid_nr_ns(pid, ns), + upid ? upid-nr : 0, upid ? upid-highnr : 0, ppid, tpid, from_kuid_munged(user_ns, cred-uid), from_kuid_munged(user_ns, cred-euid), diff --git a/include/linux/pid.h b/include/linux/pid.h index 23705a53abba..ece70b64d04c 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -51,6 +51,7 @@ struct upid { /* Try to keep pid_chain in the same cacheline as nr for find_vpid */ int nr; struct pid_namespace *ns; + u64 highnr; struct hlist_node pid_chain; }; @@ -170,6 +171,7 @@ static inline pid_t pid_nr(struct pid *pid) } pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns); +const struct upid *pid_upid_ns(struct pid *pid, struct pid_namespace *ns); pid_t pid_vnr(struct pid *pid); #define do_each_pid_task(pid, type, task) \ diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 1997ffc295a7..1f9f0455d7ef 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -26,6 +26,7 @@ struct pid_namespace { struct rcu_head rcu; int last_pid; unsigned int nr_hashed; + atomic64_t next_highpid; struct task_struct *child_reaper; struct kmem_cache *pid_cachep; unsigned int level; diff --git a/kernel/pid.c b/kernel/pid.c index 9b9a26698144..863d11a9bfbf 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -312,6 +312,8 @@ struct pid *alloc_pid(struct pid_namespace *ns) pid-numbers[i].nr = nr; pid-numbers[i].ns = tmp; + pid-numbers[i].highnr = + atomic64_inc_return(tmp-next_highpid) - 1; tmp = tmp-parent; } @@ -492,17 +494,26 @@ struct pid *find_get_pid(pid_t nr) } EXPORT_SYMBOL_GPL(find_get_pid); -pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns) +const struct upid *pid_upid_ns(struct pid *pid
Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid
[Adding CRIU people. Whoops.] On Fri, Nov 28, 2014 at 3:05 PM, Andy Lutomirski l...@amacapital.net wrote: Pid reuse is common, which means that it's difficult or impossible to read information about a pid from /proc without races. This introduces a second number associated with each (task, pidns) pair called highpid. Highpid is a 64-bit number, and, barring extremely unlikely circumstances or outright error, a (highpid, pid) will never be reused. With just this change, a program can open /proc/PID/status, read the Highpid field, and confirm that it has the expected value. If the pid has been reused, then highpid will be different. The initial implementation is straightforward: highpid is simply a 64-bit counter. If a high-end system can fork every 3 ns (which would be amazing, given that just allocating a pid requires at atomic operation), it would take well over 1000 years for highpid to wrap. For CRIU's benefit, the next highpid can be set by a privileged user. NB: The sysctl stuff only works on 64-bit systems. If the approach looks good, I'll fix that somehow. Signed-off-by: Andy Lutomirski l...@amacapital.net --- If this goes in, there's plenty of room to add new interfaces to make this more useful. For example, we could add a fancier tgkill that adds and validates hightgid and highpid, and we might want to add a syscall to read one's own hightgid and highpid. These would be quite useful for pidfiles. David, would this be useful for kdbus? CRIU people: will this be unduly difficult to support in CRIU? If you all think this is a good idea, I'll fix the sysctl stuff and document it. It might also be worth adding Hightgid to status. fs/proc/array.c | 5 - include/linux/pid.h | 2 ++ include/linux/pid_namespace.h | 1 + kernel/pid.c | 19 +++ kernel/pid_namespace.c| 22 ++ 5 files changed, 44 insertions(+), 5 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index cd3653e4f35c..f1e0e69d19f9 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, int g; struct fdtable *fdt = NULL; const struct cred *cred; + const struct upid *upid; pid_t ppid, tpid; rcu_read_lock(); @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, if (tracer) tpid = task_pid_nr_ns(tracer, ns); } + upid = pid_upid_ns(pid, ns); cred = get_task_cred(p); seq_printf(m, State:\t%s\n Tgid:\t%d\n Ngid:\t%d\n Pid:\t%d\n + Highpid:\t%llu\n PPid:\t%d\n TracerPid:\t%d\n Uid:\t%d\t%d\t%d\t%d\n @@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, get_task_state(p), task_tgid_nr_ns(p, ns), task_numa_group_id(p), - pid_nr_ns(pid, ns), + upid ? upid-nr : 0, upid ? upid-highnr : 0, ppid, tpid, from_kuid_munged(user_ns, cred-uid), from_kuid_munged(user_ns, cred-euid), diff --git a/include/linux/pid.h b/include/linux/pid.h index 23705a53abba..ece70b64d04c 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -51,6 +51,7 @@ struct upid { /* Try to keep pid_chain in the same cacheline as nr for find_vpid */ int nr; struct pid_namespace *ns; + u64 highnr; struct hlist_node pid_chain; }; @@ -170,6 +171,7 @@ static inline pid_t pid_nr(struct pid *pid) } pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns); +const struct upid *pid_upid_ns(struct pid *pid, struct pid_namespace *ns); pid_t pid_vnr(struct pid *pid); #define do_each_pid_task(pid, type, task) \ diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 1997ffc295a7..1f9f0455d7ef 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -26,6 +26,7 @@ struct pid_namespace { struct rcu_head rcu; int last_pid; unsigned int nr_hashed; + atomic64_t next_highpid; struct task_struct *child_reaper; struct kmem_cache *pid_cachep; unsigned int level; diff --git a/kernel/pid.c b/kernel/pid.c index 9b9a26698144..863d11a9bfbf 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -312,6 +312,8 @@ struct pid *alloc_pid(struct pid_namespace *ns) pid-numbers[i].nr = nr; pid-numbers[i].ns = tmp; + pid-numbers[i].highnr = + atomic64_inc_return(tmp-next_highpid) - 1; tmp = tmp-parent
Re: [systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it
On Mon, Nov 3, 2014 at 5:32 AM, Tom Gundersen t...@jklm.no wrote: Hi Andy, On Tue, Oct 28, 2014 at 11:46 PM, Andy Lutomirski l...@amacapital.net wrote: So far, hidraw_id detects U2F tokens and sets: ID_U2F_TOKEN=1 ID_SECURITY_TOKEN=1 This causes the uaccess rules to apply to U2F devices. --- I've never written any udev code before. Feedback welcome. If you think this doesn't belong in udev, I can try to find it another home. As mentioned, I don't think we should do this in core udev code (udev rule would be ok), but making it generic and exposing all the usages sounds useful. I don't have a strong opinion on whether this should be in udev or in the kernel, but if you end up going with udev, here are some comments. Thanks. I'll wait to see what Jiri thinks before sending an updated version. Also, we don't want to add any more external helpers to the udev codebase, so we should just make this a builtin (analogous to udev-builtin-usb_id.c). Minor comments inline. [...] diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c new file mode 100644 index ..e32f222f22f9 --- /dev/null +++ b/src/udev/hidraw_id/hidraw_id.c @@ -0,0 +1,144 @@ +/* + * Copyright (c) Andrew Lutomirski, 2014 + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by New udev code must be LGPL2+. OK. [snipped things I'll change if I resubmit] +/* Long item; skip it. */ +if (i + 1 = desclen) { +log_error(bad report_descriptor); +goto out; +} +i += (desc[i+1] + 3); /* Can't overflow. */ +continue; +} + +size = (sizecode 3 ? sizecode : 4); + +if (i + 1 + size desclen) { +log_error(bad report_descriptor); +goto out; +} + +for (j = 0; j size; j++) +value |= (desc[i + 1 + j] 8*j); + +if (type == 1 tag == 0) +usage_page = value; Should we verify that the data has the right length (2 or 4 bytes)? I guess we also need to handle 2-byte usage pages specially? I think we're not supposed to check the length in general, but I should re-read the part about two-byte usage pages. (Although, off the top of my head, I thought that the special case was for very long usages, not usage pages.) [...] --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Mon, Nov 3, 2014 at 11:03 AM, David Herrmann dh.herrm...@gmail.com wrote: Hi On Sun, Nov 2, 2014 at 7:57 PM, Andy Lutomirski l...@amacapital.net wrote: I want to get U2F (universal second factor, sometimes called security key or even gnubby) working on Linux. U2F tokens are HID devices that speak a custom protocol. The intent is that user code will speak to then using something like HIDAPI. The trick is that, for HIDAPI to work, something needs to recognize these devices and get udev to set appropriate device permissions. [snip] - An actual kernel driver for U2F devices using the hid group mechanism for enumeration. This seems overcomplicated. Imho, this is the way to go. Create a proper char-dev for U2F, create an API and make it work. We had this discussion earlier about vendor-extensions that should be writable via hidraw from user-space. This turned out to be really messy.. and was discussed for several weeks straight. hidraw just wasn't designed as unprivileged user-space API. For instance, what happens if a device provides U2F plus something else? Both will be on the same hidraw device. We could split hidraw per usage, but I don't see how that is superior to a proper U2F API. And once one usage can affect a device as a whole (like power-off), you're screwed. Agreed, mostly. My only real concern is that this could be annoying for the userspace developers who will need to target Linux and HIDAPI separately. Admittedly the Linux support will be trivial. I can give the driver a try. It'll actually be simpler than the spec makes it out, since a real kernel driver will have no need to arbitrate with itself. --Andy Just look at the libusb mess where some devices are handled in the kernel and some in user-space (eg., see Gnome cheese, media devices, ...). I don't think we should repeat that with HID. Thanks David -- Andy Lutomirski AMA Capital Management, LLC ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina jkos...@suse.cz wrote: On Mon, 3 Nov 2014, David Herrmann wrote: Agreed, mostly. My only real concern is that this could be annoying for the userspace developers who will need to target Linux and HIDAPI separately. Admittedly the Linux support will be trivial. I see. I'll not stop you from using hidraw, I'd just not recommend it. Especially for security stuff I dislike exposing the whole HID device writable. But yeah, I guess you got my point. Alright, I am basically thinking loudly now, but how about we allow HID drivers that would override default hidraw interface? I am very well aware of the fact that this could be opening a can of worms, so we'll have to make it very restrictive. How about, let's say, we allow HID drivers to provide custom hidraw interface (completely overriding the one that HID core would normally create) only for cases such as: - the intent is to expose only certain parts of a combined device - the intent is to introduce some level of access control I.e. still no interference of kernel with data parsing allowed. Hmm. Would this be like a filter on hidraw actions? How would udev distinguish these special hidraw devices from normal hidraw devices? Also, for U2F, this could be a little awkward. There's some crazy fragmentation stuff to allow a U2F request to be split across HID requests, and I think a kernel driver would much rather get the original unfragmented application request. --Andy I can give the driver a try. It'll actually be simpler than the spec makes it out, since a real kernel driver will have no need to arbitrate with itself. I'll gladly review any custom HID drivers. Feel free to put me on CC when sending to linux-input. There's also a lot of examples in drivers/hid/ in case you need a head-start (especially if you need the raw_event callback). Same here, of course. Please always CC me in parallel to sending to linux-input@ to make sure that the patch doesn't fall in between cracks. Thanks, -- Jiri Kosina SUSE Labs -- Andy Lutomirski AMA Capital Management, LLC ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Supporting U2F over HID on Linux?
I want to get U2F (universal second factor, sometimes called security key or even gnubby) working on Linux. U2F tokens are HID devices that speak a custom protocol. The intent is that user code will speak to then using something like HIDAPI. The trick is that, for HIDAPI to work, something needs to recognize these devices and get udev to set appropriate device permissions. My question is: how should this be done? The official way to enumerate U2F devices is to look for a HID usage page 0xf1d0 containing usage 0x1. Options include: - A builtin udev helper that reads the sysfs report_descriptor for hid or hidraw devices and sets attributes accordingly (either ID_SECURITY_TOKEN or something more general). - A udev helper that does this and doesn't live in the systemd tree. I don't love this option -- I'd prefer for this to be as plug-and-play as possible. - HID core code in the kernel to add HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code to do the same). This might end up producing a rather long string or some devices. - An actual kernel driver for U2F devices using the hid group mechanism for enumeration. This seems overcomplicated. Concretely, my U2F device is: /devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0011/hidraw/hidraw0 The 0xf1d0 thing is enumerable on /devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0011. Thoughts? --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Sun, Nov 2, 2014 at 12:42 PM, Jiri Kosina jkos...@suse.cz wrote: On Sun, 2 Nov 2014, Andy Lutomirski wrote: I want to get U2F (universal second factor, sometimes called security key or even gnubby) working on Linux. U2F tokens are HID devices that speak a custom protocol. The intent is that user code will speak to then using something like HIDAPI. The trick is that, for HIDAPI to work, something needs to recognize these devices and get udev to set appropriate device permissions. Just to make sure we are on the same page -- this is really only about setting proper device node permissions and nothing else, right? Yes. My question is: how should this be done? The official way to enumerate U2F devices is to look for a HID usage page 0xf1d0 containing usage 0x1. Options include: - A builtin udev helper that reads the sysfs report_descriptor for hid or hidraw devices and sets attributes accordingly (either ID_SECURITY_TOKEN or something more general). Hmmm ... please keep in mind that report_descriptor is actually in debugfs, so it's a bit questionable whether you can rely on it being present on well-defined location on all systems. Huh? I have /sys/devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0013/report_descriptor on my system. You can get it from different other places though, such as libusb (but then you are limited only to USB HID devices ... which might be enough in your particular case). I have no idea whether there will ever be a bluetooth U2F device. - A udev helper that does this and doesn't live in the systemd tree. I don't love this option -- I'd prefer for this to be as plug-and-play as possible. Agreed. - HID core code in the kernel to add HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code to do the same). This might end up producing a rather long string or some devices. We have been thinking about this quite a lot in the past, and decided to postpone this until there is a very good reason for it to happen. I'm not sure that U2F counts as a very good reason. - An actual kernel driver for U2F devices using the hid group mechanism for enumeration. This seems overcomplicated. Hmmm ... why do you think so? I believe it actually should be really super-trivial. The HID group part is trivial, but what interface would the driver expose? I don't think that a udev rule for hidraw matching the modalias makes any sense, and, if it were a real driver, what interface would it expose? Most cross-platform userspace code for U2F is likely to use HIDAPI. Admittedly, a U2F driver in the kernel would be slightly nicer from a multiplexing standpoint than hidraw. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Sun, Nov 2, 2014 at 12:47 PM, Tom Gundersen t...@jklm.no wrote: Hi Andy, On Sun, Nov 2, 2014 at 7:57 PM, Andy Lutomirski l...@amacapital.net wrote: I want to get U2F (universal second factor, sometimes called security key or even gnubby) working on Linux. U2F tokens are HID devices that speak a custom protocol. The intent is that user code will speak to then using something like HIDAPI. The trick is that, for HIDAPI to work, something needs to recognize these devices and get udev to set appropriate device permissions. My question is: how should this be done? The official way to enumerate U2F devices is to look for a HID usage page 0xf1d0 containing usage 0x1. Options include: - A builtin udev helper that reads the sysfs report_descriptor for hid or hidraw devices and sets attributes accordingly (either ID_SECURITY_TOKEN or something more general). I don't think we should have such special-purpose logic in the udev core. [...] - HID core code in the kernel to add HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code to do the same). This might end up producing a rather long string or some devices. This makes the most sense to me. We could put this logic (adapting the patch you posted) in src/udev/udev-builtin-usb_id.c. How would that work? Can't a USB device have more than one HID class device under it? I could imagine a future U2F keyboard that has a HID class device for the keyboard part and another one for U2F. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires benjamin.tissoi...@gmail.com wrote: On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina jkos...@suse.cz wrote: On Sun, 2 Nov 2014, Jiri Kosina wrote: Alternatively, you can just write udev rule which triggers on HID devices, issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides afterwards whether node permissions need to be altered ... right? Just to make myself clear here -- this is basically alternative to your 1st option, but for cases where you want to make yourself independent on sysfs existence (I don't what is the craziness level of setups you are considering). I am a little bit concerned with the user space retrieving the descriptor, parsing it and then deciding how to set the permissions. It's not that it is super complex, but it will duplicate the parsing we already do in the kernel. Wacom tried to do that in their usb driver, and they never managed to fully implement one. I think the HID_GROUP solution is super trivial: - add a match on the U2F input report and set the group to HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c) - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c). Then, the udev rule would be super trivial. I'm confused. What would the user-visible effect of this be? Would the hidraw node still show up? What is user code supposed to match to detect a U2F device or to otherwise set permissions? --Andy Also, if we want to further extend the kernel API for U2F, the group will already be in place. Cheers, Benjamin -- Andy Lutomirski AMA Capital Management, LLC ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Sun, Nov 2, 2014 at 3:01 PM, Benjamin Tissoires benjamin.tissoi...@gmail.com wrote: On Sun, Nov 2, 2014 at 5:49 PM, Andy Lutomirski l...@amacapital.net wrote: On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires benjamin.tissoi...@gmail.com wrote: On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina jkos...@suse.cz wrote: On Sun, 2 Nov 2014, Jiri Kosina wrote: Alternatively, you can just write udev rule which triggers on HID devices, issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides afterwards whether node permissions need to be altered ... right? Just to make myself clear here -- this is basically alternative to your 1st option, but for cases where you want to make yourself independent on sysfs existence (I don't what is the craziness level of setups you are considering). I am a little bit concerned with the user space retrieving the descriptor, parsing it and then deciding how to set the permissions. It's not that it is super complex, but it will duplicate the parsing we already do in the kernel. Wacom tried to do that in their usb driver, and they never managed to fully implement one. I think the HID_GROUP solution is super trivial: - add a match on the U2F input report and set the group to HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c) - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c). Then, the udev rule would be super trivial. I'm confused. What would the user-visible effect of this be? Would the hidraw node still show up? What is user code supposed to match to detect a U2F device or to otherwise set permissions? The only change with what we currently have is that the modalias of the device will be something like MODALIAS=hid:b0003gHID_GROUP_U2Fvp. (replace HID_GROUP_U2F with the proper hex value). So matching against this modalias is trivial, and you can just put the hidraw node rw for users. Do we really want udev matching against MODALIAS, and do we really want udev rules to hardcode hid group constants? I'd like this idea better if we added a HID_GROUP uevent property with a textual description, or perhaps something a little more specific. The hid group field seems to be used for different types of things. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Supporting U2F over HID on Linux?
On Sun, Nov 2, 2014 at 4:40 PM, Benjamin Tissoires benjamin.tissoi...@gmail.com wrote: On Sun, Nov 2, 2014 at 6:34 PM, Andy Lutomirski l...@amacapital.net wrote: On Sun, Nov 2, 2014 at 3:01 PM, Benjamin Tissoires benjamin.tissoi...@gmail.com wrote: On Sun, Nov 2, 2014 at 5:49 PM, Andy Lutomirski l...@amacapital.net wrote: On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires benjamin.tissoi...@gmail.com wrote: On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina jkos...@suse.cz wrote: On Sun, 2 Nov 2014, Jiri Kosina wrote: Alternatively, you can just write udev rule which triggers on HID devices, issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides afterwards whether node permissions need to be altered ... right? Just to make myself clear here -- this is basically alternative to your 1st option, but for cases where you want to make yourself independent on sysfs existence (I don't what is the craziness level of setups you are considering). I am a little bit concerned with the user space retrieving the descriptor, parsing it and then deciding how to set the permissions. It's not that it is super complex, but it will duplicate the parsing we already do in the kernel. Wacom tried to do that in their usb driver, and they never managed to fully implement one. I think the HID_GROUP solution is super trivial: - add a match on the U2F input report and set the group to HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c) - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c). Then, the udev rule would be super trivial. I'm confused. What would the user-visible effect of this be? Would the hidraw node still show up? What is user code supposed to match to detect a U2F device or to otherwise set permissions? The only change with what we currently have is that the modalias of the device will be something like MODALIAS=hid:b0003gHID_GROUP_U2Fvp. (replace HID_GROUP_U2F with the proper hex value). So matching against this modalias is trivial, and you can just put the hidraw node rw for users. Do we really want udev matching against MODALIAS, and do we really want udev rules to hardcode hid group constants? That's a good question. I don't think the hid group will change in the future and we can guarantee that in the kernel I think. I'd like this idea better if we added a HID_GROUP uevent property with a textual description, or perhaps something a little more specific. This refers back to Jiri's first remark. Adding such a thing is doable, but do we really want/need it :/ I would tend to think that designing a HID interface for userspace consumption might be a hint that it's the right time to give it a better name than MODALIAS :) --Andy The hid group field seems to be used for different types of things. yes, my proposal is definitively a (ugly) hack around the groups which are used to select which hid subdriver we use. An other question which comes to my mind is don't we want logind to assign the hidraw node to the proper client? We may still need to tag the device somehow, but if logind prevents untrusted application to run arbitrary code on the u2f token, that might be a little bit safer than allowing anybody to read/write. We do want it to be assigned to the proper client. My udev patch (which will almost certainly not be accepted as-is, but it could be fixed up) uses the report_descriptor to set ID_SECURITY_TOKEN=1, which in turn causes an existing rule to set TAG+=uaccess, which causes logind to do its thing. Sorry to raise more questions than providing answers. No problem. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it
On Tue, Oct 28, 2014 at 3:46 PM, Andy Lutomirski l...@amacapital.net wrote: So far, hidraw_id detects U2F tokens and sets: ID_U2F_TOKEN=1 ID_SECURITY_TOKEN=1 This causes the uaccess rules to apply to U2F devices. This works for the Plug-up security key, too. --Andy --- I've never written any udev code before. Feedback welcome. If you think this doesn't belong in udev, I can try to find it another home. .gitignore | 1 + Makefile.am| 11 rules/60-hidraw.rules | 7 ++ src/udev/hidraw_id/Makefile| 1 + src/udev/hidraw_id/hidraw_id.c | 144 + 5 files changed, 164 insertions(+) create mode 100644 rules/60-hidraw.rules create mode 12 src/udev/hidraw_id/Makefile create mode 100644 src/udev/hidraw_id/hidraw_id.c diff --git a/.gitignore b/.gitignore index f119b574c777..4bd3cdf08f0d 100644 --- a/.gitignore +++ b/.gitignore @@ -34,6 +34,7 @@ /exported /exported-* /gtk-doc.make +/hidraw_id /hostnamectl /install-tree /journalctl diff --git a/Makefile.am b/Makefile.am index fae946a388af..9f64687d32b1 100644 --- a/Makefile.am +++ b/Makefile.am @@ -3542,6 +3542,17 @@ udevlibexec_PROGRAMS += \ ata_id # -- +hidraw_id_SOURCES = \ + src/udev/hidraw_id/hidraw_id.c + +hidraw_id_LDADD = \ + libudev-internal.la \ + libsystemd-shared.la + +udevlibexec_PROGRAMS += \ + hidraw_id + +# -- cdrom_id_SOURCES = \ src/udev/cdrom_id/cdrom_id.c diff --git a/rules/60-hidraw.rules b/rules/60-hidraw.rules new file mode 100644 index ..1ee9c812f711 --- /dev/null +++ b/rules/60-hidraw.rules @@ -0,0 +1,7 @@ +# do not edit this file, it will be overwritten on update + +ACTION==remove, GOTO=hidraw_end + +SUBSYSTEM==hidraw, IMPORT{program}=hidraw_id --udev + +LABEL=keyboard_end diff --git a/src/udev/hidraw_id/Makefile b/src/udev/hidraw_id/Makefile new file mode 12 index ..d0b0e8e0086f --- /dev/null +++ b/src/udev/hidraw_id/Makefile @@ -0,0 +1 @@ +../Makefile \ No newline at end of file diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c new file mode 100644 index ..e32f222f22f9 --- /dev/null +++ b/src/udev/hidraw_id/hidraw_id.c @@ -0,0 +1,144 @@ +/* + * Copyright (c) Andrew Lutomirski, 2014 + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include stdio.h +#include string.h +#include sys/types.h +#include sys/stat.h +#include fcntl.h +#include unistd.h + +#include libudev.h +#include libudev-private.h + +_printf_(6,0) +static void log_fn(struct udev *udev, int priority, + const char *file, int line, const char *fn, + const char *format, va_list args) +{ +log_metav(priority, file, line, fn, format, args); +} + +int main(int argc, char **argv) +{ +struct udev *udev; +struct udev_device *dev, *hiddev; +char path[4096]; +unsigned char desc[4096]; +int desclen; +int fd = -1; +int i; +int ret = 1; +unsigned int usage_page = 0; +int is_u2f_token = 0; + +if (argc != 2) { +fprintf(stderr, Usage: hidraw_id SYSFS_PATH|--udev\n); +return 1; +} + +log_parse_environment(); +log_open(); + +udev = udev_new(); + +udev_set_log_fn(udev, log_fn); + +if (!strcmp(argv[1], --udev)) +dev = udev_device_new_from_environment(udev); +else +dev = udev_device_new_from_syspath(udev, argv[1]); + +if (!dev) +goto out; + +hiddev = udev_device_get_parent(dev); +if (!hiddev) +goto out; + +if (snprintf(path, sizeof(path), %s/report_descriptor, + udev_device_get_syspath(hiddev)) (int)sizeof(path)) +return 1; + +fd = open(path, O_RDONLY | O_NOFOLLOW); +if (fd == -1) +goto out; + +desclen = read(fd, desc, sizeof(desc)); +if (desclen = 0
Re: [systemd-devel] Writing a udev rule for U2F security tokens?
On Tue, Oct 28, 2014 at 1:40 AM, Greg KH gre...@linuxfoundation.org wrote: On Mon, Oct 27, 2014 at 04:37:14PM -0700, Andy Lutomirski wrote: On Mon, Oct 27, 2014 at 4:32 PM, Greg KH gre...@linuxfoundation.org wrote: On Mon, Oct 27, 2014 at 04:12:30PM -0700, Andy Lutomirski wrote: Hi- I'd like to write a generic udev rule for U2F security tokens and to possibly get it integrated into systemd / udev, but I'm not sure how to write it in the first place. U2F tokens are USB HID devices that have a usage page 0xF1D0 that contains usage 0x01. The rule should match any hidraw device with that property. Can this be done without a user helper? Is there an existing helper in which it would make sense to add such a check? Here's the draft USB forum allocation: http://www.usb.org/developers/hidpage/HUTRR48.pdf Here's the draft spec from the FIDO Alliance: https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf In practice, I expect little change between the draft and final specs, since these devices are already for sale and Chromium supports them. I don't understand, what would a udev rule do with these devices? Shouldn't they be exported automatically using the hid raw interface so that userspace can talk to them? What else needs to be done? Wow, I clearly failed to transfer my thoughts into email... I want to set ID_SECURITY_TOKEN=1 or, more generally, cause the uaccess tag to be set so that users have permission to use the token. This rule works in Fedora for the existing tokens by Yubico: KERNEL==hidraw*, SUBSYSTEM==hidraw, ATTRS{idVendor}==1050, ATTRS{idProduct}==0113|0114|0115|0116|0120, ENV{ID_SECURITY_TOKEN}=1 but it won't work for other brands of U2F token. If there's no sysfs attribute that you can read directly to determine that it is a a U2F token, then it's not easy to write a udev rule. You can write a simple program to read the hid pages from the hidraw interface, and then set an environment variable from there if the FIDO Alliance Page is present. You can use a udev rule for that, but it will have to be an external tool. Would a tool like that be considered appropriate to distribute with udev? It would have somewhat unpleasant overhead for what is currently a niche use case. I suppose the kernel could also be modified to expose this, but doing that cleanly will involve exposing all the usage pages in sysfs, which is more complexity than I really want to add. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it
So far, hidraw_id detects U2F tokens and sets: ID_U2F_TOKEN=1 ID_SECURITY_TOKEN=1 This causes the uaccess rules to apply to U2F devices. --- I've never written any udev code before. Feedback welcome. If you think this doesn't belong in udev, I can try to find it another home. .gitignore | 1 + Makefile.am| 11 rules/60-hidraw.rules | 7 ++ src/udev/hidraw_id/Makefile| 1 + src/udev/hidraw_id/hidraw_id.c | 144 + 5 files changed, 164 insertions(+) create mode 100644 rules/60-hidraw.rules create mode 12 src/udev/hidraw_id/Makefile create mode 100644 src/udev/hidraw_id/hidraw_id.c diff --git a/.gitignore b/.gitignore index f119b574c777..4bd3cdf08f0d 100644 --- a/.gitignore +++ b/.gitignore @@ -34,6 +34,7 @@ /exported /exported-* /gtk-doc.make +/hidraw_id /hostnamectl /install-tree /journalctl diff --git a/Makefile.am b/Makefile.am index fae946a388af..9f64687d32b1 100644 --- a/Makefile.am +++ b/Makefile.am @@ -3542,6 +3542,17 @@ udevlibexec_PROGRAMS += \ ata_id # -- +hidraw_id_SOURCES = \ + src/udev/hidraw_id/hidraw_id.c + +hidraw_id_LDADD = \ + libudev-internal.la \ + libsystemd-shared.la + +udevlibexec_PROGRAMS += \ + hidraw_id + +# -- cdrom_id_SOURCES = \ src/udev/cdrom_id/cdrom_id.c diff --git a/rules/60-hidraw.rules b/rules/60-hidraw.rules new file mode 100644 index ..1ee9c812f711 --- /dev/null +++ b/rules/60-hidraw.rules @@ -0,0 +1,7 @@ +# do not edit this file, it will be overwritten on update + +ACTION==remove, GOTO=hidraw_end + +SUBSYSTEM==hidraw, IMPORT{program}=hidraw_id --udev + +LABEL=keyboard_end diff --git a/src/udev/hidraw_id/Makefile b/src/udev/hidraw_id/Makefile new file mode 12 index ..d0b0e8e0086f --- /dev/null +++ b/src/udev/hidraw_id/Makefile @@ -0,0 +1 @@ +../Makefile \ No newline at end of file diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c new file mode 100644 index ..e32f222f22f9 --- /dev/null +++ b/src/udev/hidraw_id/hidraw_id.c @@ -0,0 +1,144 @@ +/* + * Copyright (c) Andrew Lutomirski, 2014 + * + * This program is free software: you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation, either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include stdio.h +#include string.h +#include sys/types.h +#include sys/stat.h +#include fcntl.h +#include unistd.h + +#include libudev.h +#include libudev-private.h + +_printf_(6,0) +static void log_fn(struct udev *udev, int priority, + const char *file, int line, const char *fn, + const char *format, va_list args) +{ +log_metav(priority, file, line, fn, format, args); +} + +int main(int argc, char **argv) +{ +struct udev *udev; +struct udev_device *dev, *hiddev; +char path[4096]; +unsigned char desc[4096]; +int desclen; +int fd = -1; +int i; +int ret = 1; +unsigned int usage_page = 0; +int is_u2f_token = 0; + +if (argc != 2) { +fprintf(stderr, Usage: hidraw_id SYSFS_PATH|--udev\n); +return 1; +} + +log_parse_environment(); +log_open(); + +udev = udev_new(); + +udev_set_log_fn(udev, log_fn); + +if (!strcmp(argv[1], --udev)) +dev = udev_device_new_from_environment(udev); +else +dev = udev_device_new_from_syspath(udev, argv[1]); + +if (!dev) +goto out; + +hiddev = udev_device_get_parent(dev); +if (!hiddev) +goto out; + +if (snprintf(path, sizeof(path), %s/report_descriptor, + udev_device_get_syspath(hiddev)) (int)sizeof(path)) +return 1; + +fd = open(path, O_RDONLY | O_NOFOLLOW); +if (fd == -1) +goto out; + +desclen = read(fd, desc, sizeof(desc)); +if (desclen = 0) +goto out; + +/* Parse the report descriptor. */ +for (i = 0; i desclen; ) { +unsigned char tag = desc[i] 4; +unsigned char type = (desc[i] 2) 0x3; +unsigned char sizecode = desc[i] 0x3; +int size, j;
[systemd-devel] Writing a udev rule for U2F security tokens?
Hi- I'd like to write a generic udev rule for U2F security tokens and to possibly get it integrated into systemd / udev, but I'm not sure how to write it in the first place. U2F tokens are USB HID devices that have a usage page 0xF1D0 that contains usage 0x01. The rule should match any hidraw device with that property. Can this be done without a user helper? Is there an existing helper in which it would make sense to add such a check? Here's the draft USB forum allocation: http://www.usb.org/developers/hidpage/HUTRR48.pdf Here's the draft spec from the FIDO Alliance: https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf In practice, I expect little change between the draft and final specs, since these devices are already for sale and Chromium supports them. Thanks, Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Writing a udev rule for U2F security tokens?
On Mon, Oct 27, 2014 at 4:32 PM, Greg KH gre...@linuxfoundation.org wrote: On Mon, Oct 27, 2014 at 04:12:30PM -0700, Andy Lutomirski wrote: Hi- I'd like to write a generic udev rule for U2F security tokens and to possibly get it integrated into systemd / udev, but I'm not sure how to write it in the first place. U2F tokens are USB HID devices that have a usage page 0xF1D0 that contains usage 0x01. The rule should match any hidraw device with that property. Can this be done without a user helper? Is there an existing helper in which it would make sense to add such a check? Here's the draft USB forum allocation: http://www.usb.org/developers/hidpage/HUTRR48.pdf Here's the draft spec from the FIDO Alliance: https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf In practice, I expect little change between the draft and final specs, since these devices are already for sale and Chromium supports them. I don't understand, what would a udev rule do with these devices? Shouldn't they be exported automatically using the hid raw interface so that userspace can talk to them? What else needs to be done? Wow, I clearly failed to transfer my thoughts into email... I want to set ID_SECURITY_TOKEN=1 or, more generally, cause the uaccess tag to be set so that users have permission to use the token. This rule works in Fedora for the existing tokens by Yubico: KERNEL==hidraw*, SUBSYSTEM==hidraw, ATTRS{idVendor}==1050, ATTRS{idProduct}==0113|0114|0115|0116|0120, ENV{ID_SECURITY_TOKEN}=1 but it won't work for other brands of U2F token. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] udev: fail firmware loading immediately if no search path is defined
On Wed, Aug 7, 2013 at 12:52 AM, Maarten Lankhorst m.b.lankho...@gmail.com wrote: Op 07-08-13 02:26, Andy Lutomirski schreef: On Tue, Aug 6, 2013 at 5:24 PM, Tom Gundersen t...@jklm.no wrote: On 6 Aug 2013 18:32, Bryan Kadzban br...@kadzban.is-a-geek.net wrote: On Tue, Aug 06, 2013 at 11:17:17AM +0200, Tom Gundersen wrote: On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen t...@jklm.no wrote: On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst m.b.lankho...@gmail.com wrote: Op 05-08-13 18:29, Andy Lutomirski schreef: The systemd commit below can delay firmware loading by multiple minutes if CONFIG_FW_LOADER_USER_HELPER=y. Unfortunately no one noticed that the systemd-udev change would break new kernels as well as old kernels. Since the kernel apparently can't count on reasonable userspace support, turn this thing off by default. commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a Author: Tom Gundersen t...@jklm.no Date: Mon Mar 18 15:12:18 2013 +0100 udev: make firmware loading optional and disable by default Distros that whish to support old kernels should set --with-firmware-dirs=/usr/lib/firmware/updates:/usr/lib/firmware to retain the old behaviour. methinks this patch should be reverted then, Well, all the code is still there, so it can be enabled if anyone wants it. or a stub should be added to udev to always fail firmware loading so timeouts don't occur. I think the only use (if any) of a userspace firmware loader would be for anyone who wants a custom one (i.e., not udev), so we shouldn't just fail the loading from udev unconditionally. How about we just improve the udev documentation a bit, similar to Andy's kernel patch? Sorry, I should first have checked. We already document this in the README: Userspace firmware loading is deprecated, will go away, and sometimes causes problems: CONFIG_FW_LOADER_USER_HELPER=n ...And this patch is making the kernel default to the correct behavior, instead of the now-broken-by-udev behavior. I'm not sure I see the issue with it? :-) Oh yeah this patch is totally the right thing to do, I was just arguing that there is nothing to be done on the udev side. (Add me to the list of people that think udev is broken too, fwiw. But let's at least not leave *both* sides in a broken-by-default state.) Well I don't think it is too much to ask that the kernel and udev should be configured in a consistent way. Especially as thing still work even if you get it wrong, albeit with a delay. Except that the current defaults are inconsistent and there is no explanation anywhere in the logs when this is screwed up. So what is wrong with my 'fail in udev immediately if not configured' idea? In that case it doesn't matter whether CONFIG_FW_LOADER_USER_HELPER is set or not. You could even print a useful message for the user in udev to the log, so they have an idea of what happened. Breaking udev on older still supported kernels by default without printing any debug info is silly, and the only cost is a small increase in disk space when unused. I did so in below patch. Seems reasonable to me. It might be worth adding something to the message to suggest turning off CONFIG_FW_LOADER_USER_HELPER. ~Maarten I converted ELEMENTSOF to != ELEMENTSOF in the loop to work correctly when the array is empty. Most code in udev-builtin-firmware is eliminated at -O2 optimization level when FIRMWARE_PATH is not explicitly set. 8 diff --git a/Makefile.am b/Makefile.am index b8b8d06..2097629 100644 --- a/Makefile.am +++ b/Makefile.am @@ -2235,6 +2235,7 @@ libudev_core_la_SOURCES = \ src/udev/udev-ctrl.c \ src/udev/udev-builtin.c \ src/udev/udev-builtin-btrfs.c \ + src/udev/udev-builtin-firmware.c \ src/udev/udev-builtin-hwdb.c \ src/udev/udev-builtin-input_id.c \ src/udev/udev-builtin-keyboard.c \ @@ -2271,14 +2272,6 @@ libudev_core_la_CPPFLAGS = \ $(AM_CPPFLAGS) \ -DFIRMWARE_PATH=$(FIRMWARE_PATH) -if ENABLE_FIRMWARE -libudev_core_la_SOURCES += \ - src/udev/udev-builtin-firmware.c - -dist_udevrules_DATA += \ - rules/50-firmware.rules -endif - if HAVE_KMOD libudev_core_la_SOURCES += \ src/udev/udev-builtin-kmod.c diff --git a/configure.ac b/configure.ac index 0ecc716..dc7a3e3 100644 --- a/configure.ac +++ b/configure.ac @@ -823,8 +823,6 @@ for i in $with_firmware_path; do done IFS=$OLD_IFS AC_SUBST(FIRMWARE_PATH) -AS_IF([test x${FIRMWARE_PATH} != x], [ AC_DEFINE(HAVE_FIRMWARE, 1, [Define if FIRMWARE is available]) ]) -AM_CONDITIONAL(ENABLE_FIRMWARE, [test x${FIRMWARE_PATH} != x]) # -- AC_ARG_ENABLE([gudev], diff --git a/rules/50-firmware.rules b/rules/50-firmware.rules deleted file mode 100644 index f0ae684..000 --- a/rules/50-firmware.rules +++ /dev/null
Re: [systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it
On Tue, Aug 6, 2013 at 2:17 AM, Tom Gundersen t...@jklm.no wrote: On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen t...@jklm.no wrote: On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst m.b.lankho...@gmail.com wrote: Op 05-08-13 18:29, Andy Lutomirski schreef: The systemd commit below can delay firmware loading by multiple minutes if CONFIG_FW_LOADER_USER_HELPER=y. Unfortunately no one noticed that the systemd-udev change would break new kernels as well as old kernels. Since the kernel apparently can't count on reasonable userspace support, turn this thing off by default. commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a Author: Tom Gundersen t...@jklm.no Date: Mon Mar 18 15:12:18 2013 +0100 udev: make firmware loading optional and disable by default Distros that whish to support old kernels should set --with-firmware-dirs=/usr/lib/firmware/updates:/usr/lib/firmware to retain the old behaviour. methinks this patch should be reverted then, Well, all the code is still there, so it can be enabled if anyone wants it. or a stub should be added to udev to always fail firmware loading so timeouts don't occur. I think the only use (if any) of a userspace firmware loader would be for anyone who wants a custom one (i.e., not udev), so we shouldn't just fail the loading from udev unconditionally. How about we just improve the udev documentation a bit, similar to Andy's kernel patch? Sorry, I should first have checked. We already document this in the README: Userspace firmware loading is deprecated, will go away, and sometimes causes problems: CONFIG_FW_LOADER_USER_HELPER=n TBH, the udev README is the last thing I'm going to check to figure out why I don't have wifi for a couple minutes after boot. Also, the message is missing the point. It's not that it's deprecated and sometimes causes problems -- it's that udev *changed behavior* and breaks your system if you have CONFIG_FW_LOADER_USER_HELPER=y. If udev logged something (for a couple of years), that would be a different story. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it
On Tue, Aug 6, 2013 at 5:24 PM, Tom Gundersen t...@jklm.no wrote: On 6 Aug 2013 18:32, Bryan Kadzban br...@kadzban.is-a-geek.net wrote: On Tue, Aug 06, 2013 at 11:17:17AM +0200, Tom Gundersen wrote: On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen t...@jklm.no wrote: On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst m.b.lankho...@gmail.com wrote: Op 05-08-13 18:29, Andy Lutomirski schreef: The systemd commit below can delay firmware loading by multiple minutes if CONFIG_FW_LOADER_USER_HELPER=y. Unfortunately no one noticed that the systemd-udev change would break new kernels as well as old kernels. Since the kernel apparently can't count on reasonable userspace support, turn this thing off by default. commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a Author: Tom Gundersen t...@jklm.no Date: Mon Mar 18 15:12:18 2013 +0100 udev: make firmware loading optional and disable by default Distros that whish to support old kernels should set --with-firmware-dirs=/usr/lib/firmware/updates:/usr/lib/firmware to retain the old behaviour. methinks this patch should be reverted then, Well, all the code is still there, so it can be enabled if anyone wants it. or a stub should be added to udev to always fail firmware loading so timeouts don't occur. I think the only use (if any) of a userspace firmware loader would be for anyone who wants a custom one (i.e., not udev), so we shouldn't just fail the loading from udev unconditionally. How about we just improve the udev documentation a bit, similar to Andy's kernel patch? Sorry, I should first have checked. We already document this in the README: Userspace firmware loading is deprecated, will go away, and sometimes causes problems: CONFIG_FW_LOADER_USER_HELPER=n ...And this patch is making the kernel default to the correct behavior, instead of the now-broken-by-udev behavior. I'm not sure I see the issue with it? :-) Oh yeah this patch is totally the right thing to do, I was just arguing that there is nothing to be done on the udev side. (Add me to the list of people that think udev is broken too, fwiw. But let's at least not leave *both* sides in a broken-by-default state.) Well I don't think it is too much to ask that the kernel and udev should be configured in a consistent way. Especially as thing still work even if you get it wrong, albeit with a delay. Except that the current defaults are inconsistent and there is no explanation anywhere in the logs when this is screwed up. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)
On Mon, Aug 5, 2013 at 4:18 AM, Kay Sievers k...@vrfy.org wrote: On Fri, Aug 2, 2013 at 6:28 PM, Zbigniew Jędrzejewski-Szmek zbys...@in.waw.pl wrote: On Fri, Aug 02, 2013 at 09:04:44AM -0700, Andy Lutomirski wrote: CONFIG_FW_LOADER_USER_HELPER=y Do you need this? Unsetting this should help. This option enables / disables the invocation of user-helper (e.g. udev) for loading firmware files as a fallback after the direct file loading in kernel fails. The user-mode helper is no longer required unless you have a special firmware file that resides in a non-standard path. On recent systems, if the kernel configures CONFIG_FW_LOADER_USER_HELPER=y and a firmware is not found by the kernel, the kernel will issue a request which is ignored by userspace and will block in that for 60 seconds. Udev is no longer in the game of firmware loading, not even as a fallback, it will just completely ignore all kernel firmware class events. The source code in udev to handle firmware requests is disabled by default, currently still kept around for old kernels without the in-kernel firmware loader, but it will be deleted in the near future. Any chance you'd consider a less regression-inducing path to getting rid of this feature? For example, have udev warn and immediate fail firmware loading requests for a few releases, then just warn, then drop support? Meanwhile, CONFIG_FW_LOADER_USER_HELPER is still default y (!), so udev has introduced massive bootup delays into the default configuration with no warning. It might be nice to change it to default n, get rid of everything that selects it, and possible even rename it to something with LEGACY or OBSOLETE in the name so that make oldconfig will prompt. --Andy Any issues with firmware timeouts should be addressed in the kernel by disabling CONFIG_FW_LOADER_USER_HELPER or by removing the fallback code from the in-kernel loader. Kay -- Andy Lutomirski AMA Capital Management, LLC ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it
The systemd commit below can delay firmware loading by multiple minutes if CONFIG_FW_LOADER_USER_HELPER=y. Unfortunately no one noticed that the systemd-udev change would break new kernels as well as old kernels. Since the kernel apparently can't count on reasonable userspace support, turn this thing off by default. commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a Author: Tom Gundersen t...@jklm.no Date: Mon Mar 18 15:12:18 2013 +0100 udev: make firmware loading optional and disable by default Distros that whish to support old kernels should set --with-firmware-dirs=/usr/lib/firmware/updates:/usr/lib/firmware to retain the old behaviour. --- drivers/base/Kconfig | 15 +++ drivers/firmware/Kconfig | 1 - 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 5daa259..de3903e 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -146,13 +146,20 @@ config EXTRA_FIRMWARE_DIR config FW_LOADER_USER_HELPER bool Fallback user-helper invocation for firmware loading depends on FW_LOADER - default y + default n help This option enables / disables the invocation of user-helper (e.g. udev) for loading firmware files as a fallback after the - direct file loading in kernel fails. The user-mode helper is - no longer required unless you have a special firmware file that - resides in a non-standard path. + direct file loading in kernel fails. + + Since March 2013, a default udev build does not understand + firmware loading requests. These udev versions will not + even indicate failure; instead they cause long timeouts. + This can dramatically slow down the boot process. + + Say Y only if you have special firmware-loading requirements + and if you have a non-standard helper that will handle these + requests. config DEBUG_DRIVER bool Driver Core verbose debug messages diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig index 07478728..9387630 100644 --- a/drivers/firmware/Kconfig +++ b/drivers/firmware/Kconfig @@ -64,7 +64,6 @@ config DELL_RBU tristate BIOS update support for DELL systems via sysfs depends on X86 select FW_LOADER - select FW_LOADER_USER_HELPER help Say m if you want to have the option of updating the BIOS for your DELL system. Note you need a Dell OpenManage or Dell Update package (DUP) -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)
[cc: linux-kernel, linux-hotplug, and systemd-devel. This is 3.11-rc3+] On Fri, Aug 2, 2013 at 12:38 AM, Johannes Berg johan...@sipsolutions.net wrote: On Thu, 2013-08-01 at 21:38 -0700, Andy Lutomirski wrote: At boot, I get: [ 12.537108] iwlwifi :03:00.0: irq 51 for MSI/MSI-X ... [ 132.676781] iwlwifi :03:00.0: loaded firmware version 9.221.4.1 build 25532 op_mode iwldvm This sounds familiar, but wasn't it fixed awhile ago? It wasn't exactly fixed and it's really more of a userspace problem - we probably request firmware version 8, and then it takes 30 seconds to time out for each of 8,7,6,5, after which the next request for 4 is successful. Why's it requesting those firmwares? They don't seem to exist on intellinuxwireless.org. I have: CONFIG_FW_LOADER=y # CONFIG_FIRMWARE_IN_KERNEL is not set CONFIG_EXTRA_FIRMWARE= CONFIG_FW_LOADER_USER_HELPER=y I don't know why your userspace isn't behaving differently though. johannes -- Andy Lutomirski AMA Capital Management, LLC ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)
On Fri, Aug 2, 2013 at 9:21 AM, Johannes Berg johan...@sipsolutions.net wrote: On Fri, 2013-08-02 at 09:04 -0700, Andy Lutomirski wrote: It wasn't exactly fixed and it's really more of a userspace problem - we probably request firmware version 8, and then it takes 30 seconds to time out for each of 8,7,6,5, after which the next request for 4 is successful. Why's it requesting those firmwares? They don't seem to exist on intellinuxwireless.org. Well for one you've never even mentioned what device you have, and then also it's not requesting 8/7 only 6,5,4 -- I guess the timeout was increased to 60 seconds (or I'm remembering wrong and it always was? I thought it was 30s) I have an Ultimate-N 6300 (rev 35). It's requesting at least versions 6 and 5 (I saw them in udevadm monitor). It looks like the g2a and g2b variants have -5 and -6 versions, but 6000-4 appears to be the only relevant version for my hardware. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On Jun 25, 2013 2:43 AM, Lennart Poettering lenn...@poettering.net wrote: On Mon, 24.06.13 17:09, Andy Lutomirski (l...@amacapital.net) wrote: On Mon, Jun 24, 2013 at 4:57 PM, Lennart Poettering lenn...@poettering.net wrote: On Mon, 24.06.13 16:01, Andy Lutomirski (l...@amacapital.net) wrote: AFAICT the main reason that systemd uses cgroup is to efficiently track which service various processes came from and to send signals, and it seems like that use case could be handled without cgroups at all by creative use of subreapers and a syscall to broadcast a signal to everything that has a given subreaper as an ancestor. In that case, systemd could be asked to stay away from cgroups even in the single-hierarchy case. systemd uses cgroups to manage services. Managing services means many things. Among them: keeping track of processes, listing processes of a service, killing processes of a service, doing per-service logging (which means reliably, immediately, and race-freely tracing back messages to the service which logged them), about 55 other things, and also resource management. I don't see how I can do anything of this without something like cgroups, i.e. hierarchial, resource management involved systemd which allows me to securely put labels on processes. Boneheaded straw-man proposal: two new syscalls and a few spare processes. int sys_task_reaper(int tid): Returns the reaper for the task tid (which is 1 if there's no subreaper). (This could just as easily be a file in /proc.) int sys_killall_under_subreaper(int subreaper, int sig): Broadcasts sig to all tasks under subreaper (excluding subreaper). Guarantees that, even if those tasks are forking, they all get the signal. Then, when starting a service, systemd forks, sets the child to be a subreaper, then forks that child again to exec the service. Does this do everything that's needed? No. It doesn't do anything that's needed. How do I list all PIDs in a service with this? Walk /proc/subreaper/children recursively. A kernel patch to make that field show up unconditionally instead of hiding under EXPERT would help. How do I determine the service of a PID? Call sys_task_reaper, then look up what service that subreaper comes from. How do i do resource manage with this? With cgroups, unless the admin has configured systemd not to use cgroups, in which case you don't. (The whole point would be to keep DefaultControllers= without using the one and only cgroup hierarchy.) --Andy sys_task_reaper is trivial to implement (that functionality is already there in the reparenting code), and sys_killall_under_subreaper is probably not so bad. This has one main downside I can think of: it wastes a decent number of processes (one subreaper per service). Yeah, also the downside that it doesn't do what we need. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On 06/21/2013 10:36 AM, Lennart Poettering wrote: 2) This hierarchy becomes private property of systemd. systemd will set it up. Systemd will maintain it. Systemd will rearrange it. Other software that wants to make use of cgroups can do so only through systemd's APIs. This single-writer logic is absolutely necessary, since interdependencies between the various controllers, the various attributes, the various cgroups are non-obvious and we simply cannot allow that cgroup users alter the tree independently of each other forever. Due to all this: The Pax Cgroup document is a thing of the past, it is dead. If you are using non-trivial cgroup setups with systemd right now, then things will change for you. We will provide you with similar functionality as before, but things will be different and less low-level. As long as you only used the high-level options such as CPUShares, MemoryLimit and so on you should be on the safe side. Hmm. This may be tricky for my use case. Here are a few issues. For all I know, they may already be supported (or planned), but I don't want to get caught. 1. I put all the entire world into a separate, highly constrained cgroup. My real-time code runs outside that cgroup. This seems to exactly what slices are for, but I need kernel threads to go in to the constrained cgroup. Will systemd support this? 2. I manage services and tasks outside systemd (for one thing, I currently use Ubuntu, but even if I were on Fedora, I have a bunch of fine-grained things that figure out how they're supposed to allocate resources, and porting them to systemd just to keep working in the new world order would be a PITA [1]). (cgroups have the odd feature that they are per-task, not per thread group, and the systemd proposal seems likely to break anything that actually wants task granularity. I may actually want to use this, even though it's a bit evil -- my real-time thread groups have non-real-time threads.) I think that what I want are something like sub-unit cgroups -- I want to be able to ask systemd to further subdivide the group for my unit, login session, or whatever. Would this be reasonable? (Another way of thinking of this is that a unit would have a whole cgroup hierarchy instead of just one cgroup.) I think that the single-hierarchy model will require that I subdivide my user session so that the default sub-unit cgroup is constrained similarly to the default slice. I'll lose functionality, but I don't think this is a showstopper. A different approach would be to allow units to (with systemd's cooperation) escape into their own, dynamically created unit. This seems kind of awful. 3. My code runs unprivileged, but it still wants to configure itself. If needed, I can write a little privileged daemon to handle the systemd calls. I think I can get away without anything fancy if a unit (login session?) grant the right to manipulate sub-unit cgroups to a non-root user. 4. As mentioned, I'm on Ubuntu some of the time. I'd like to keep the same code working on systemd and non-systemd systems. How hard would it be to run systemd as just a cgroup controller? That is, have systemd create its slices, run exactly one unit that represents the whole system, and let other things use the cgroup API. [1] Some day, I might convert my code to use a session systemd instance. I'm not holding my breath, but it could be nice. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On Mon, Jun 24, 2013 at 6:27 AM, Lennart Poettering lenn...@poettering.net wrote: On Sat, 22.06.13 15:19, Andy Lutomirski (l...@amacapital.net) wrote: 2. I manage services and tasks outside systemd (for one thing, I currently use Ubuntu, but even if I were on Fedora, I have a bunch of fine-grained things that figure out how they're supposed to allocate resources, and porting them to systemd just to keep working in the new world order would be a PITA [1]). [...] I think that what I want are something like sub-unit cgroups -- I want to be able to ask systemd to further subdivide the group for my unit, login session, or whatever. Would this be reasonable? (Another way of thinking of this is that a unit would have a whole cgroup hierarchy instead of just one cgroup.) The idea is not even to allow this. Basically, if you want to partitions your daemon into different cgroups you need to do that through systemd's abstractions: slices and services. To make this more palatable we'll introduce throw-away units though, so that you can dynamically run something as a workload and don't need to be concerned about naming this, or cleaning it up. Hmm. My particular software can maybe live with this with unpleasant modifications, but this will break anything that, say, accepts a connection from a client, forks into a (possibly new) cgroup based on the identity of that client, and then does something. How can this support containers or the use of cgroups in a non-systemwide systemd instance? Containers may no longer be allowed to escape from the cgroup they start in, but there should (IMO) still be a way for things to subdivide their cgroup-controlled resources. If I want to have a hierarchy more than two levels deep, I suspect I'm SOL under this model. If I'm understanding correctly, there will be slices, then units, and that's it. 4. As mentioned, I'm on Ubuntu some of the time. I'd like to keep the same code working on systemd and non-systemd systems. How hard would it be to run systemd as just a cgroup controller? That is, have systemd create its slices, run exactly one unit that represents the whole system, and let other things use the cgroup API. I have no idea, I don't develop Ubuntu. They will have to come up with some cgroup maintenance daemon of their own. As I know them they'll either do a port of the systemd counter part (but that's going to be tough!), or they'll stick something half-baked into Upstart... Sorry, if this all sounds a bit disappointing. But yeah, this all is not a trivial change... I'm worried that the impedance mismatch between systemd and any other possible API is going to be enormous. On systemd, I'll have to: - Create a throwaway unit - Figure out how to wire up stdout and stderr correctly (I use them for communication between processes) - Translate the current directory, the environment, etc. into systemd configuration - Translate my desired resource controls into systemd's let's pretend that there aren't really cgroups underlying it configuration - Start the throwaway unit - Figure out how to get notified when it finishes Without systemd, I'll have to: - fork() - Ask whatever is managing cgroups to switch me to a different cgroup - exec() This is going to suck, I think. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On Mon, Jun 24, 2013 at 12:10 PM, Tejun Heo t...@kernel.org wrote: Hello, Andy. On Mon, Jun 24, 2013 at 11:49:05AM -0700, Andy Lutomirski wrote: I have an idea where it should be headed in the long term but am not sure about short-term solution. Given that the only sort wide-spread use case is virt kthreads, maybe it just needs to be special cased for now. Not sure. I'll be okay (I think) if I can reliably set affinities of these threads. I'm currently doing it with cgroups. That being said, I don't like the direction that kernel thread magic affinity is going. It may be great for cache performance and reducing random bounding, but I have a scheduling-jitter-sensitive workload and I don't care about overall system throughput. I need the kernel to stay the f!k off my important cpus, and arranging for this to happen is becoming increasingly complicated. Why is it becoming increasingly complicated? The biggest change probably was the shared workqueue pool implementation but that was years ago and workqueue has grown pool attributes recently adding more properly designed flexibility and, for example, adding default affinity for !per-cpu workqueues should be pretty easy now. But anyways, if it's an issue, it should be examined and properly solved rather than hacking up hacky solution with cgroup. Because more things are becoming per cpu without the option of moving of per-cpu things on behalf of one cpu to another cpu. RCU is a nice exception. cgroups are most certainly something that a binary can be aware of. It's not like a sysctl knob at all -- it's per process. I have lots No, it definitely is not. Sure it is more granular than sysctl but that's it. It exposes control knobs which are directly tied into kernel implementation details. It is not a properly designed programming API by any stretch of imagination. It is an extreme failure on the kernel side that that part hasn't been made crystal clear from the beginning. I don't know how intentional it was but the whole thing is completely botched. cgroup *never* was held to the standard necessary for any widely available API and many of the controls it exposes are exactly at the level of sysctls. As the interface was filesystem, it could evade scrutiny and with the hierarchical organization also gave the impression that it's something which can be used directly by individual applications. It found a loophole in the way we implement and police kernel APIs and then exploited it like there's no tomorrow. We are firmly bound to maintain what already has been exposed from the kernel side and I'm not gonna break any of them but the free-for-all cgroup is broken and deprecated. It's gonna wither and fade away and any attempt to reverse that will be met with extreme prejudice. The functionality I care about is that a program can reliably and hierarchically subdivide system resources -- think rlimits but actually useful. I, and probably many other things, want this functionality. Yes, the current cgroup interface is awful, but it gets one thing right: it's a hierarchy. Back when my software ran on Windows, I used the awful job interface to allocate resources among different parts of my software. When I switched to Linux, I lost some of that functionality and replaced other bits with cgroups. It's hackish, but it works. Now we're apparently moving toward having a unified hierarchy (great!), a more sane API (great!), and a nasty userspace situation where systemd-using systems control the hierarchy through a highly limiting systemd-specific interface and non-systemd systems do something else which will presumably look nothing like what systemd does. I would argue that designing a kernel interface that requires exactly one userspace component to manage it and ties that one userspace component to something that can't easily be deployed everywhere (the init system) is as big a cheat as the old approach of sneaking bad APIs in through a filesystem was. IOW, please, when designing this, please specify an API that programs are permitted to use, and let that API be reviewed. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On Mon, Jun 24, 2013 at 12:37 PM, Tejun Heo t...@kernel.org wrote: Hello, On Mon, Jun 24, 2013 at 12:24:38PM -0700, Andy Lutomirski wrote: Because more things are becoming per cpu without the option of moving of per-cpu things on behalf of one cpu to another cpu. RCU is a nice exception. Hmm... but in most cases it's per-cpu on the same cpu that initiated the task. If a given CPU is just crunching numbers and IRQ affinity is properly configured, the CPU shouldn't be bothered too much by per-cpu work items. If there are, please let us know. We can hunt them down. I'm not just crunching numbers -- I do (nonblocking) I/O as well. The functionality I care about is that a program can reliably and hierarchically subdivide system resources -- think rlimits but actually useful. I, and probably many other things, want this functionality. Yes, the current cgroup interface is awful, but it gets one thing right: it's a hierarchy. And the hierarchy support was completely broken for many resource controllers up until only several releases ago. I would argue that designing a kernel interface that requires exactly one userspace component to manage it and ties that one userspace component to something that can't easily be deployed everywhere (the init system) is as big a cheat as the old approach of sneaking bad APIs in through a filesystem was. In terms of API, it is firmly at the level of sysctl. That's it. While I agree that having a proper kernel API for hierarchical resource management could be nice. That currently is out of scope. We're already knee-deep in shit with the limited capabilities we're trying to implement. Also, I really don't think cgroup is the right interface for such thing even if we get to that. It should be part of the usual process/thread model, not this completely separate thing on the side. IOW, please, when designing this, please specify an API that programs are permitted to use, and let that API be reviewed. cgroup is not that API and it's never gonna be in all likelihood. As for systemd vs. non-systemd compatibility, I'm afraid I don't have a good answer. This is still all in a pretty earlly phase and the proper abstractions and APIs are being figured out. Hopefully, we'll converge on a mostly compatible high-level abstraction which can be presented regardless of the actual base system implementation. So what is cgroup for? That is, what's the goal for what the new API should be able to do? AFAICT the main reason that systemd uses cgroup is to efficiently track which service various processes came from and to send signals, and it seems like that use case could be handled without cgroups at all by creative use of subreapers and a syscall to broadcast a signal to everything that has a given subreaper as an ancestor. In that case, systemd could be asked to stay away from cgroups even in the single-hierarchy case. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On Mon, Jun 24, 2013 at 4:19 PM, Tejun Heo t...@kernel.org wrote: Hello, On Mon, Jun 24, 2013 at 04:01:07PM -0700, Andy Lutomirski wrote: So what is cgroup for? That is, what's the goal for what the new API should be able to do? It is a for controlling and distributing resources. That part doesn't change. It's just not built to be used directly by individual applications. It's an admin tool just like sysctl - be that admin be a human or userland base system. There's a huge chasm between something which can be generally used by normal applications and something which is restricted to admins and base systems in terms of interface generality and stability, security, how the abstractions fit together with the existing APIs and so on. cgroup firmly belongs to the former. It still serves the same purpose but isn't, in a way, developed enough to be used directly by individual applications and I'm not even sure we want or need to develop it to such a level. My application is running on a single-purpose system I administer. I guess what I'm trying to say here is that many systems will rather fundamentally use systemd. Admins of those systems should still have access to a reasonably large subset of cgroup functionality. If the single-hierarchy model is going to prevent going around systemd and if systemd isn't going to expose all of the useful cgroup functionality, then perhaps there should be a way to separate systemd's hierarchy from the cgroup hierarchy. Looking at http://0pointer.de/blog/projects/cgroups-vs-cgroups.html, it looks like systemd doesn't actually need the cgroup resource control functionality. Maybe there's a way to disentangle this stuff. The /proc/pid/children feature that CRIU added seems like a decent start. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On Mon, Jun 24, 2013 at 4:37 PM, Tejun Heo t...@kernel.org wrote: Hello, Andy. On Mon, Jun 24, 2013 at 04:27:17PM -0700, Andy Lutomirski wrote: I guess what I'm trying to say here is that many systems will rather fundamentally use systemd. Admins of those systems should still have access to a reasonably large subset of cgroup functionality. If the single-hierarchy model is going to prevent going around systemd and if systemd isn't going to expose all of the useful cgroup functionality, then perhaps there should be a way to separate systemd's hierarchy from the cgroup hierarchy. I don't think systemd will prevent you from buildling your own hierarchy on the side. It sure won't be properly supported and things might break in corener cases / over time but if you're willing to take such risks anyway... In the long term tho, what should happen probably is examining use cases like yours and then incorporating sensible mechanisms to support that into the base system infrastructure. It might not be completely identical but I'm sure over time we'll be able to find what are the fundamental pieces and proper abstractions. Right now, we're exposing way too much without even clearly understanding what are being enabled. It is unsustainable. Now I'm confused. I thought that support for multiple hierarchies was going away. Is it here to stay after all? --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On Mon, Jun 24, 2013 at 4:40 PM, Tejun Heo t...@kernel.org wrote: Hello, On Mon, Jun 24, 2013 at 4:38 PM, Andy Lutomirski l...@amacapital.net wrote: Now I'm confused. I thought that support for multiple hierarchies was going away. Is it here to stay after all? It is going to be deprecated but also stay around for quite a while. That said, I didn' t mean to use multiple hierarchies. I was saying that if you build a sub-hierarchy in the unified hierarchy, you're likely to get away with it in most cases. Isn't that exactly what I was originally asking for? Quoting from earlier in the thread: On Mon, Jun 24, 2013 at 6:27 AM, Lennart Poettering lenn...@poettering.net wrote: On Sat, 22.06.13 15:19, Andy Lutomirski (l...@amacapital.net) wrote: 2. I manage services and tasks outside systemd (for one thing, I currently use Ubuntu, but even if I were on Fedora, I have a bunch of fine-grained things that figure out how they're supposed to allocate resources, and porting them to systemd just to keep working in the new world order would be a PITA [1]). [...] I think that what I want are something like sub-unit cgroups -- I want to be able to ask systemd to further subdivide the group for my unit, login session, or whatever. Would this be reasonable? (Another way of thinking of this is that a unit would have a whole cgroup hierarchy instead of just one cgroup.) The idea is not even to allow this. Basically, if you want to partitions your daemon into different cgroups you need to do that through systemd's abstractions: slices and services. To make this more palatable we'll introduce throw-away units though, so that you can dynamically run something as a workload and don't need to be concerned about naming this, or cleaning it up. If I can subdivide my service in the hierarchy, then I'm happy. If this gets lost *and* systemd insists on controlling the one and only cgroup hierarchy, then I think I have serious problems with the new regime. --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [HEADSUP] cgroup changes
On Mon, Jun 24, 2013 at 4:57 PM, Lennart Poettering lenn...@poettering.net wrote: On Mon, 24.06.13 16:01, Andy Lutomirski (l...@amacapital.net) wrote: AFAICT the main reason that systemd uses cgroup is to efficiently track which service various processes came from and to send signals, and it seems like that use case could be handled without cgroups at all by creative use of subreapers and a syscall to broadcast a signal to everything that has a given subreaper as an ancestor. In that case, systemd could be asked to stay away from cgroups even in the single-hierarchy case. systemd uses cgroups to manage services. Managing services means many things. Among them: keeping track of processes, listing processes of a service, killing processes of a service, doing per-service logging (which means reliably, immediately, and race-freely tracing back messages to the service which logged them), about 55 other things, and also resource management. I don't see how I can do anything of this without something like cgroups, i.e. hierarchial, resource management involved systemd which allows me to securely put labels on processes. Boneheaded straw-man proposal: two new syscalls and a few spare processes. int sys_task_reaper(int tid): Returns the reaper for the task tid (which is 1 if there's no subreaper). (This could just as easily be a file in /proc.) int sys_killall_under_subreaper(int subreaper, int sig): Broadcasts sig to all tasks under subreaper (excluding subreaper). Guarantees that, even if those tasks are forking, they all get the signal. Then, when starting a service, systemd forks, sets the child to be a subreaper, then forks that child again to exec the service. Does this do everything that's needed? sys_task_reaper is trivial to implement (that functionality is already there in the reparenting code), and sys_killall_under_subreaper is probably not so bad. This has one main downside I can think of: it wastes a decent number of processes (one subreaper per service). --Andy ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel