Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-09 Thread Andy Lutomirski
On Wed, Dec 9, 2020 at 11:22 AM Topi Miettinen  wrote:
>
> On 9.12.2020 17.14, Andy Lutomirski wrote:
> >

> Maybe also malware which can escape all means of detection, enforced by
> the CPU? Though I don't know if any malware scanners for Linux work can
> check for fileless, memory only malware.

I don't think this is really relevant to malware detection.  You can't
do syscalls from SGX code, for example, and, even if you could,
malware behavior analysis would be unaffected.  The concern seems to
be more that, once someone has discovered some malware, if it's
protected by SGX then it's plausible that you can't disassemble it.

>
> >
> > In Intel’s original vision, only specially licensed vendors could create 
> > SGX software, but Linux pushed back against this quite hard, and new CPUs 
> > allow unlicensed enclaves. So your Skylake CPUs support SGX, but not on 
> > Linux.
>
> Kudos to Linux for the push.

:)

I don't know if Linux gets full credit for this, but I think we at
least had some impact.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-09 Thread Andy Lutomirski

> On Dec 9, 2020, at 12:58 AM, Topi Miettinen  wrote:
> 
> On 9.12.2020 2.42, Jarkko Sakkinen wrote:
>>> On Wed, Dec 09, 2020 at 02:15:28AM +0200, Jarkko Sakkinen wrote:
>>> On Wed, Dec 09, 2020 at 01:15:27AM +0200, Topi Miettinen wrote:
>>> As  a further argument, I just did this on a Fedora system:
>>> $ find /dev -perm /ugo+x -a \! -type d -a \! -type l
>>> No results.  So making /dev noexec doesn't seem to have any benefit.
>> 
>> It's no surprise that there aren't any executables in /dev since
>> removing MAKEDEV ages ago. That's not the issue, which is that
>> /dev is a writable directory (for UID=0 but no capabilities are
>> needed) and thus a potential location for constructing unapproved
>> executables if it is also mounted exec (W^X).
> 
> UID 0 can just change mount options, though, unless SELinux or similar is 
> used. And SELinux can protect /dev just fine without noexec.
 
 Well, mounting would need CAP_SYS_ADMIN in addition to UID 0. Also SELinux
 is not universal and the policies might not contain all users or services.
 
 -Topi
>>> 
>>> What's the data that supports having noexec /dev anyway? With root
>>> access I can then just use something else like /dev/shm mount.
>>> 
>>> Has there been out in the wild real world cases that noexec mount
>>> of would have prevented?
>> Typo: "of" = "of /dev"
>>> For me this sounds a lot just something that "feels more secure"
>>> without any measurable benefit. Can you prove me wrong?
>> The debate is circled around something not well defined. Of course you
>> get theoretically more safe system when you decrease priviliges anywhere
>> in the system. Like you could start do grazy things with stuff that
>> unprivilged user has access, in order to prevent malware to elevate to
>> UID 0 in the first place.
>> I think where this go intellectually wrong is that we are talking about
>> *default installation* of a distribution. That should have somewhat sane
>> common sense access control settings. For like a normal desktop user
>> noexec /dev will not do any possible favor.
>> Then there is the case when you want to harden installation for an
>> application, let's' say some server. In that case you will anyway
>> fine-tune the security settings and go grazy enough with hardening.
>> When you tailor a server, it's a standard practice to enumerate and
>> adjust the mount points if needed.
> 
> I think we agree that there's a need for either way to allow SGX (if default 
> is hardened) or a hardening option (in the opposite case). For systemd I see 
> two approaches:
> 
> 1. Default noexec /dev, override with something like
> - ExecPaths=/dev
> - MountOptions=/dev:rw,exec,dev,nosuid
> - or even MountOptions=/dev/sgx:rw,exec,dev,nosuid
> - ProtectDev=no
> - AllowSGX=yes
> 
> 2. Default exec /dev, override with
> - NoExecPaths=/dev
> - MountOptions=/dev:rw,noexec,dev,nosuid
> - ProtectDev=yes
> - DenySGX=yes
> 
> I'd prefer 1. but of course 2. would be reasonable.

I would argue for 2, for the following reason.  I absolutely agree that 
hardening a system by making it impossible to create executable code 
dynamically is valuable, but I don’t think it’s a good default. By default, 
programs like gcc and clang should work, but so should JITs, and JITs are 
getting more popular and powerful all the time.  In a default setting that 
allows JITs, etc, I see no benefit at all to making /dev noexec. To the 
contrary, making /dev noexec seems like plugging a little restricted corner of 
code creation (because it requires UID=0) while allowing the easy ways (/tmp, 
/home, /dev/shm, unshare(2), mmap(), etc).  By all means let admins harden 
this, but I see no reason to apply some of the hardening when the rest is 
disabled.

> 
>> To summarize, I neither understand the intended target audience.
> 
> We have something in common: me neither. What's the target audience for SGX? 
> What's the use case? What are the users: browsers, system services? How would 
> applications use SGX? Should udev rules for /dev/sgx make it available to any 
> logged in users with uaccess tags?
> 
> 

I would certainly like it to be available to all software, with the possible 
exception of extra-hardened systems. Using SGX is not really an interesting 
attack surface. The main threat is that malware might use SGX to make itself 
hard to reverse engineer.

In Intel’s original vision, only specially licensed vendors could create SGX 
software, but Linux pushed back against this quite hard, and new CPUs allow 
unlicensed enclaves. So your Skylake CPUs support SGX, but not on Linux.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-08 Thread Andy Lutomirski

> On Dec 8, 2020, at 12:45 PM, Topi Miettinen  wrote:
> 
> On 8.12.2020 20.07, Andy Lutomirski wrote:
>>> On Thu, Nov 19, 2020 at 10:05 AM Topi Miettinen  wrote:
>>> 
>>> On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote:
>>>> On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote:
>>>>> Hi udev people-
>>>>> 
>>>>> The upcoming Linux SGX driver has a device node /dev/sgx.  User code
>>>>> opens it, does various setup things, mmaps it, and needs to be able to
>>>>> create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
>>>>> noexec.
>>>>> 
>>>>> Can udev arrange to make a device node executable on distros that make
>>>>> /dev noexec?  This could be done by bind-mounting from an exec tmpfs.
>>>>> Alternatively, the kernel could probably learn to ignore noexec on
>>>>> /dev/sgx, but that seems a little bit evil.
>>>> 
>>>> I'd be inclined to simply drop noexec from /dev by default.
>>>> We don't do noexec on either /tmp or /dev/shm (because that causes 
>>>> immediate
>>>> problems with stuff like Java and cffi). And if you have those two at your
>>>> disposal anyway, having noexec on /dev doesn't seem important.
>>> 
>>> I'd propose to not enable exec globally, but if a service needs SGX, it
>>> could use something like MountOptions=/dev:exec only in those cases
>>> where it's needed. That way it's possible to disallow writable and
>>> executable file systems for most services (which typically don't need
>>> /tmp or /dev/shm either). Of course the opposite
>>> (MountOptions=/dev:noexec) would be also possible, but I'd expect that
>>> this would be needed to be used more often.
>>> 
>> I imagine the opposite would be more sensible.  It seems odd to me
>> that we would want any SGX-using service to require both special mount
>> options and regular ACL permissions.
> 
> How common are thes SGX-using services? Will every service start using it 
> without any special measures taken on it's behalf, or perhaps only a special 
> SGX control tool needs access? What about unprivileged user applications, do 
> they ever want to access SGX? Could something like Widevine deep in a browser 
> need to talk to SGX in a DRM scheme?

I honestly don’t know. Widevine is probably some unholy mess of SGX and ME 
crud. But regular user programs may well end up using SGX for little non-evil 
enclaves, e.g. storing their keys securely.  It would be nice if unprivileged 
enclaves just work as long as the use has appropriate permissions on the device 
nodes.

SGX adoption has been severely hampered by the massive series of recent 
vulnerabilities and by Intel’s silly licensing scheme. The latter won’t be 
supported upstream.

> 
>> As  a further argument, I just did this on a Fedora system:
>> $ find /dev -perm /ugo+x -a \! -type d -a \! -type l
>> No results.  So making /dev noexec doesn't seem to have any benefit.
> 
> It's no surprise that there aren't any executables in /dev since removing 
> MAKEDEV ages ago. That's not the issue, which is that /dev is a writable 
> directory (for UID=0 but no capabilities are needed) and thus a potential 
> location for constructing unapproved executables if it is also mounted exec 
> (W^X).

UID 0 can just change mount options, though, unless SELinux or similar is used. 
And SELinux can protect /dev just fine without noexec.

> 
> -Topi
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Creating executable device nodes in /dev?

2020-12-08 Thread Andy Lutomirski
On Thu, Nov 19, 2020 at 10:05 AM Topi Miettinen  wrote:
>
> On 19.11.2020 18.32, Zbigniew Jędrzejewski-Szmek wrote:
> > On Thu, Nov 19, 2020 at 08:17:08AM -0800, Andy Lutomirski wrote:
> >> Hi udev people-
> >>
> >> The upcoming Linux SGX driver has a device node /dev/sgx.  User code
> >> opens it, does various setup things, mmaps it, and needs to be able to
> >> create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
> >> noexec.
> >>
> >> Can udev arrange to make a device node executable on distros that make
> >> /dev noexec?  This could be done by bind-mounting from an exec tmpfs.
> >> Alternatively, the kernel could probably learn to ignore noexec on
> >> /dev/sgx, but that seems a little bit evil.
> >
> > I'd be inclined to simply drop noexec from /dev by default.
> > We don't do noexec on either /tmp or /dev/shm (because that causes immediate
> > problems with stuff like Java and cffi). And if you have those two at your
> > disposal anyway, having noexec on /dev doesn't seem important.
>
> I'd propose to not enable exec globally, but if a service needs SGX, it
> could use something like MountOptions=/dev:exec only in those cases
> where it's needed. That way it's possible to disallow writable and
> executable file systems for most services (which typically don't need
> /tmp or /dev/shm either). Of course the opposite
> (MountOptions=/dev:noexec) would be also possible, but I'd expect that
> this would be needed to be used more often.
>

I imagine the opposite would be more sensible.  It seems odd to me
that we would want any SGX-using service to require both special mount
options and regular ACL permissions.

As  a further argument, I just did this on a Fedora system:

$ find /dev -perm /ugo+x -a \! -type d -a \! -type l

No results.  So making /dev noexec doesn't seem to have any benefit.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Creating executable device nodes in /dev?

2020-11-19 Thread Andy Lutomirski
Hi udev people-

The upcoming Linux SGX driver has a device node /dev/sgx.  User code
opens it, does various setup things, mmaps it, and needs to be able to
create PROT_EXEC mappings.  This gets quite awkward if /dev is mounted
noexec.

Can udev arrange to make a device node executable on distros that make
/dev noexec?  This could be done by bind-mounting from an exec tmpfs.
Alternatively, the kernel could probably learn to ignore noexec on
/dev/sgx, but that seems a little bit evil.

Thanks,
Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] kdbus refactoring?

2015-11-09 Thread Andy Lutomirski
On Mon, Nov 9, 2015 at 9:07 AM, Greg KH <gre...@linuxfoundation.org> wrote:
> On Mon, Nov 09, 2015 at 05:02:45PM +, Måns Rullgård wrote:
>> Andy Lutomirski <l...@amacapital.net> writes:
>>
>> > On Sun, Nov 8, 2015 at 3:30 PM, Greg KH <gre...@linuxfoundation.org> wrote:
>> >> On Sun, Nov 08, 2015 at 10:39:43PM +0100, Richard Weinberger wrote:
>> >>> On Sun, Nov 8, 2015 at 10:35 PM, Greg KH <gre...@linuxfoundation.org> 
>> >>> wrote:
>> >>> > On Sun, Nov 08, 2015 at 10:06:31PM +0100, Richard Weinberger wrote:
>> >>> >> Hi all,
>> >>> >>
>> >>> >> after reading on the removal of kdbus from Rawhide[1] I've searched
>> >>> >> the mailinglist archives for more details but didn't find anything.
>> >>> >> So, what are your plans?
>> >>> >>
>> >>> >> [1] 
>> >>> >> https://lists.fedoraproject.org/pipermail/kernel/2015-October/006011.html
>> >>> >
>> >>> > As that link said, based on the result of the code being in Rawhide, it
>> >>> > is now being reworked / redesigned.  The result will be posted for
>> >>> > review "when it's ready".
>> >>>
>> >>> If you rework/redesign something you have to know what you want to 
>> >>> change.
>> >>> That's why I was asking for the plan...
>> >>
>> >> Since when do people post "plans" or "design documents" on lkml without
>> >> real code?  Again, code will be posted when it's ready, like any other
>> >> kernel submission.
>> >
>> > I ask for feedback on ideas and designs on a fairly regular basis.  I
>> > even frequently get valuable feedback :)
>> >
>> > I would like to think that the kernel community would have something
>> > of value to add to the process of designing and implementing a major
>> > new IPC mechanism.
>>
>> The "trust us, we'll show it when it's ready" attitude reminds me of the
>> controversial TPP and TTIP negotiations.
>
> Ok, that's just trolling, cut it out.
>
> When something is even in the "hey look, it works, here's the big
> changes from last time", we will of course post it, but right now,
> things are being totally revisited based on the feedback we have
> received so far.  Give people a chance to recover from conferences and
> then get back to work...
>

I hate to say this, but this approach to receiving feedback makes me
really dislike the process.

I read a fairly large fraction of the kdbus code.  I found what I
perceived to be issues, and I spoke up.  I was told for quite a while
that the authors disagreed that the issues I found were issues and
that my assessment of the security aspects of the code was correct.

Now the submission has been withdrawn (because of feedback received so
far?  from me?) and there will apparently be a new submission out of
the blue, allegedly based on feedback.

As a developer, I'm willing to ask for feedback on ideas and to ask
for feedback on code.  In many cases, I've gotten (correct!) feedback
telling me that my design is wrong or needs major changes.  I *always*
try to answer such feedback respectfully and, if the reviewers are
right (which they usually are), I won't keep shoving code they don't
like in their face.  In fact, IIRC I got my start as a serious x86
developer when I wrote some code and tglx told me that the way I
designed it was unacceptable for upstream.  In response, I thought
about what the issues were, asked some questions, and rewrite the
majority of the code.  I think I learned a lot from the process, and
the code was vastly improved as a result.  If I'd sent substantially
the same patch series three or four more times and then declared that
I was withdrawing it without commenting directly on what I'd changed,
I really doubt that anyone would have taken my next submission
seriously.

Please understand that kdbus' approach to receiving feedback is very
off-putting.  Fortunately the vast majority of kernel developers
receive feedback for graciously and transparently, because otherwise
I'd probably just never review anything.  Frankly, if I were in the
chain of people through whom the kdbus code would flow to an eventual
home in Linus' tree, I would just say that the developers have used up
my patience as a reviewer and the onus would be on the developers to
demonstrate that it's worth my time to continue thinking about the
code.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] kdbus refactoring?

2015-11-09 Thread Andy Lutomirski
On Sun, Nov 8, 2015 at 3:30 PM, Greg KH  wrote:
> On Sun, Nov 08, 2015 at 10:39:43PM +0100, Richard Weinberger wrote:
>> On Sun, Nov 8, 2015 at 10:35 PM, Greg KH  wrote:
>> > On Sun, Nov 08, 2015 at 10:06:31PM +0100, Richard Weinberger wrote:
>> >> Hi all,
>> >>
>> >> after reading on the removal of kdbus from Rawhide[1] I've searched
>> >> the mailinglist archives for more details but didn't find anything.
>> >> So, what are your plans?
>> >>
>> >> [1] 
>> >> https://lists.fedoraproject.org/pipermail/kernel/2015-October/006011.html
>> >
>> > As that link said, based on the result of the code being in Rawhide, it
>> > is now being reworked / redesigned.  The result will be posted for
>> > review "when it's ready".
>>
>> If you rework/redesign something you have to know what you want to change.
>> That's why I was asking for the plan...
>
> Since when do people post "plans" or "design documents" on lkml without
> real code?  Again, code will be posted when it's ready, like any other
> kernel submission.

I ask for feedback on ideas and designs on a fairly regular basis.  I
even frequently get valuable feedback :)

I would like to think that the kernel community would have something
of value to add to the process of designing and implementing a major
new IPC mechanism.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Add ambient capability support to execution environment config?

2015-10-08 Thread Andy Lutomirski
For non-root services, getting Capabilities= and CapabilityBoundingSet= to
do anything useful is rather tricky.  Would it make sense to add
AmbientCapabilities= to set ambient (and, implicitly, inheritable)
capabilities, which will be available in Linux 4.3?

Alternatively, there could be a boolean option to change the meaning of
Capabilities so that it uses ambient capabilities instead of whatever it
currently does.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-20 Thread Andy Lutomirski
On Apr 20, 2015 7:57 AM, Lennart Poettering lenn...@poettering.net
wrote:

 On Fri, 17.04.15 09:14, Andy Lutomirski (l...@amacapital.net) wrote:

  My point here is that there's no real shortage of downsides to this
  scheme, and there still appears to be little to no benefit.

 Well, let's turn this around. You seem to really dislike caps. And you
 vaguely claim security holes, which we have shown know don't
 exist. So, now, can you clearly explain why precisely you dislike them
 so much still?  And something more technical then systemd shouldn't
 use them or i don't like them, or they should only be used in the
 kernel, because these are not technical reasons, they are just claims
 of personal taste.

 I will grant you that they aren't particularly expressive, and I will
 grant you that one day there might be better concepts. But that's not
 a strong reason not to support them really, that's just a reason to
 later add support for something better.

Technical reasons:

1. They can't be usefully delegated to a namespace.

2. The set of caps that exist is controlled by the kernel, whereas the set
of dbus methods is large and controlled by userspace.  Caps can't scale to
accommodate flexble userspace policies.

3. They don't appear to add value, and avoiding unnecessary complexity is
good.

4. They suck.  This is a technical issue -- the kernel doesn't allow
flexible assignments of caps to processes.  This is a problem for kernel
API users and it will be a problem for you.

--Andy


 Lennart

 --
 Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-20 Thread Andy Lutomirski
On Apr 20, 2015 9:07 AM, Lennart Poettering lenn...@poettering.net
wrote:

 On Mon, 20.04.15 08:51, Andy Lutomirski (l...@amacapital.net) wrote:

 I will grant you that they aren't particularly expressive, and I
will
 grant you that one day there might be better concepts. But that's
not
 a strong reason not to support them really, that's just a reason
to
 later add support for something better.
   
Technical reasons:
   
1. They can't be usefully delegated to a namespace.
  
   Not sure I can parse that. If you use the bus to communicate across
   namespace boundaries then each side lacks caps for the other. Great,
   that's how it should be. And same as for uid checks btw... if a uid
   cannot be translated, then it will not be passed!
 
  No.   You're right for nspawn-style namespaces, but not for namespaces
more
  broadly.  Namespaces are a great way to drop various privileges, but
your
  use of caps prevents certain uses (e.g. confining your hypothetical
  time-setting daemon in a namespace).

 I have seen no use of userns for sandboxing normal daemons so far. I
 have seen tons of daemons using caps for such sandboxing.

Sandstorm.io

I bet Chromium will follow suit, and don't forget Docker and similar tools.


 maybe if userns in its current iteration doesn't mix as well as you'd
 like it with caps this is indication that userns might need some more
 polish, and not that caps are necessarily the bad guy here?


Nope, userns works just fine.

 I mean, as the one who added most of the sandboxing features we expose
 in systemd .service files currently (including things like
 ReadOnlyDirectories=, PrviateTmp=, PrivateNetwork= which are all based
 on kernel namespacing), I completely fail to see how we could even
 expose user namespace like this in a good way that would hold water?
 Capability-based sandboxing on the other hand is much easier to
 handle, and in many cases a highly efficient way to sandbox stuff. Two
 examples:

There's a world outside systemd and, in that world, programs would like to
be able to sandbox themselves.  Userns is wonderful for that purpose.


 systemd-networkd drops privileges, becomes the systemd-network
 user, only retains CAP_NET_ADMIN. That's actually already a really
 good sandbox!

 systemd-timesyncd drops privileges, becomes the systemd-timesync
 user, only retains CAP_SYS_TIME. And that's actually a really good
 sandbox too!

 Both daemons are network facing, hence sandboxing things like this is
 actually of quite some importance, and it *works*! Today! And it is
 easy! easy enough for most administrators to grok it easily! And
 because of that it is actually *good* security!

Except that if it's too coarse-gained, it fails to actually separate
privileges.


 And please don't discount these two daemons as irrelevant
 examples. They are highly relevant, since they run on a good chunk of
 Linux systems, as one of the few daemons that run really everywhere.

 Caps *do* have good uses, please don't claim otherwise. They are a
 *lot* more relevant on today's system than userns at least!

 (Note that I am not saying that userns are really a bad idea or so, I
 just would like to ask you to keep things in perspective: caps are
 *good* -- even though they could be much better. And they are a ton
 more useful and used than userns right now)

   OK, they are not very expressive, I granted you that already. But they
   are still more expressive than uid == 0.
  
   That they are not expressive is something I can agree with, as
   mentioned above, but I don't consider this a real issue not to using
   them. I mean, it would be great if we had something better in the
   kernel, like capsicum or whatever, but we don't. And since caps are
   pretty well supported otherwise on Linux, and they are better then
   simple uid == 0 checks, I think they should be supported by kdbus too.
 
  This is a bad excuse.  Given that you're designing something new, you
have
  plenty of room to do better.

 We are not really in the business in designing comprehensive new
 access control systems that can be used for in-kernel and in-userspace
 subsystems.

 Sure, we also have bus policy, but that given it's non-interactive
 logic it's not really suitable for the uses where we need uid or caps
 checks, i.e. dynamic access control within called methods that need to
 check different privileges dynamically.

 Lennart

 --
 Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-20 Thread Andy Lutomirski
On Apr 20, 2015 8:22 AM, Lennart Poettering lenn...@poettering.net
wrote:

 On Mon, 20.04.15 08:08, Andy Lutomirski (l...@amacapital.net) wrote:

  On Apr 20, 2015 7:57 AM, Lennart Poettering lenn...@poettering.net
  wrote:
  
   On Fri, 17.04.15 09:14, Andy Lutomirski (l...@amacapital.net) wrote:
  
My point here is that there's no real shortage of downsides to this
scheme, and there still appears to be little to no benefit.
  
   Well, let's turn this around. You seem to really dislike caps. And you
   vaguely claim security holes, which we have shown know don't
   exist. So, now, can you clearly explain why precisely you dislike them
   so much still?  And something more technical then systemd shouldn't
   use them or i don't like them, or they should only be used in the
   kernel, because these are not technical reasons, they are just claims
   of personal taste.
  
   I will grant you that they aren't particularly expressive, and I will
   grant you that one day there might be better concepts. But that's not
   a strong reason not to support them really, that's just a reason to
   later add support for something better.
 
  Technical reasons:
 
  1. They can't be usefully delegated to a namespace.

 Not sure I can parse that. If you use the bus to communicate across
 namespace boundaries then each side lacks caps for the other. Great,
 that's how it should be. And same as for uid checks btw... if a uid
 cannot be translated, then it will not be passed!

No.   You're right for nspawn-style namespaces, but not for namespaces more
broadly.  Namespaces are a great way to drop various privileges, but your
use of caps prevents certain uses (e.g. confining your hypothetical
time-setting daemon in a namespace).


  2. The set of caps that exist is controlled by the kernel, whereas the
set
  of dbus methods is large and controlled by userspace.  Caps can't scale
to
  accommodate flexble userspace policies.

 OK, they are not very expressive, I granted you that already. But they
 are still more expressive than uid == 0.

 That they are not expressive is something I can agree with, as
 mentioned above, but I don't consider this a real issue not to using
 them. I mean, it would be great if we had something better in the
 kernel, like capsicum or whatever, but we don't. And since caps are
 pretty well supported otherwise on Linux, and they are better then
 simple uid == 0 checks, I think they should be supported by kdbus too.

This is a bad excuse.  Given that you're designing something new, you have
plenty of room to do better.


  3. They don't appear to add value, and avoiding unnecessary complexity
is
  good.

 Well, I disagree on this. I think they are better because more
 fine-grained than uid == 0 checks.

  4. They suck.  This is a technical issue -- the kernel doesn't allow
  flexible assignments of caps to processes.  This is a problem for kernel
  API users and it will be a problem for you.

 Not a technical reason, unlike you claim. Just a personal taste issue.

Sure it is.  Caps are defined in the kernel sources and, as seems to have
been covered pretty well in this thread, the list of caps is very far from
what a userspace dbus service would want to check.

--Andy


 Honestly, I don't think the issues you raise are very convincing

 Lennart

 --
 Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-17 Thread Andy Lutomirski
On Apr 17, 2015 4:53 AM, Djalal Harouni tix...@opendz.org wrote:

 Hi Andy,

 On Thu, Apr 16, 2015 at 12:30:28PM -0700, Andy Lutomirski wrote:
  On Thu, Apr 16, 2015 at 11:23 AM, Lennart Poettering
  lenn...@poettering.net wrote:
 [...]
  AFAICT this piece of kdbus code serves to enable a rather odd way to
  write privilege-separated services to change the time and kill
  processes.  The cost is complex security code that, at best, fails
  secure in the presence of different user namespaces, and the cost also
  involves touching a global refcount for each message sent (this might
  be the *only* thing that would reference init_user_ns's refcount when
  sending).  Oh yeah, the cost is also ABI crap -- if, say, my
 The global ref-counts on metadata is just a workaround due to userns and
 caps. I actually thought we already sorted that out?

  https://lkml.org/lkml/2015/3/25/702

 Hmm there are other paths that refs user_ns, the mqueue notification
 perhaps ?

 Please note that we also have _per_ user quota accounting, the trade off
 of accouting prevents further performance penalties on other bus
 operations. Referring to performance critical code, this code path can
 just be ignored by to not opt-in for KDBUS_ATTACH_CAPS which is the
 default behaviour.

Quoting that link:

 It's conditional on KDBUS_ATTACH_CAPS, anyway.

Fair enough.

[end quote]

I don't believe it'll be usefully conditional.  Systemd is pretty
clearly planning on using it, so you get a silly, if small,
performance hit.

My point here is that there's no real shortage of downsides to this
scheme, and there still appears to be little to no benefit.

--Andy


 Thanks!

 --
 Djalal Harouni
 http://opendz.org
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-17 Thread Andy Lutomirski
On Apr 17, 2015 5:42 AM, Simon McVittie
simon.mcvit...@collabora.co.uk wrote:

 On 16/04/15 15:52, Andy Lutomirski wrote:
  (I really think this dichotomy
  needs to be removed, *especially* since it looks like code already
  exists to try to use both metadata sources.  This seems like it's just
  asking for security screw-ups.)

 Would it address this concern if there was an explicit API separation
 into things that can't be faked, suitable for authorization and
 things that could have been faked, only suitable for debugging - such
 that the uid would be in the first set only, capabilities would be in
 the first set on kdbus but absent or in the second set on traditional
 D-Bus, and /proc/*/cmdline would always be in the second set?

It would certainly improve the sd-bus code, I think.  I'm not a
systemd developer, though.

From the kernel side, I don't even see the point of reporting caps for
debugging IPC things.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-17 Thread Andy Lutomirski
On Apr 17, 2015 6:05 AM, Cristian Rodríguez crrodrig...@opensuse.org wrote:

 On Fri, Apr 17, 2015 at 7:51 AM, Lennart Poettering
 lenn...@poettering.net wrote:

  Groups *suck* as authentication scheme. If you add one group for each
  privilege you want, then you'll have a huge number of groups, and
  that's hardly desirable. It's pretty close to being unmanagable with
  user/group editors. Also, you can never take group membership away,
  since users who once where members of group can create sgid binaries
  which allows them to always return into that group forever.

 Not to mention, we are running out of system users and groups in
 distributions (if we didn't already) and some people want us to
 provide fixed UID/GID system users
 across distributions for clustering applications...this is a totally
 unworkable way forward.

I believe you're arguing that you think gids are a scarse resource, so
you want to save ~2 gids (certainly fewer than 64) by inventing a
whole new userspace authorization scheme using *caps* that doesn't
even solve the problem that you want to solve.

I'm not sure how this is supposed to justify anything.  Caps are
probably the single least scalable authorization mechanism you could
come up with.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 3:23 AM, Tom Gundersen t...@jklm.no wrote:
 Hi Andy,

 On Thu, Apr 16, 2015 at 2:55 AM, Andy Lutomirski l...@amacapital.net wrote:
 Yesterday, I discovered SD_BUS_VTABLE_CAPABILITY.  Are there any
 examples in which it does anything?

 Please note that you need to be using kdbus to get any capabilities
 transported, so in dbus1 this does nothing (as on dbus1 using
 capabilities would be racy). Are you testing this on kdbus?

No.

I'm looking at sd_bus_query_sender_privilege, which does:

r = sd_bus_query_sender_creds(call,
SD_BUS_CREDS_UID|SD_BUS_CREDS_EUID|SD_BUS_CREDS_EFFECTIVE_CAPS,
creds);

That, in turn, does:

if (!c || !(c-mask  SD_BUS_CREDS_PID)) {
/* We couldn't read anything from the call, let's try
 * to get it from the sender or peer */

if (call-sender)
return sd_bus_get_name_creds(call-bus,
call-sender, mask, creds);
else
return sd_bus_get_owner_creds(call-bus, mask, creds);
}

return bus_creds_extend_by_pid(c, mask, creds);

The sd_bus_get_name_creds call seems questionable to me -- isn't that
trying to augment missing information from the time of call with
metadata from the sender's connection?  (I really think this dichotomy
needs to be removed, *especially* since it looks like code already
exists to try to use both metadata sources.  This seems like it's just
asking for security screw-ups.)

The sd_bus_get_owner_creds call looks totally bogus.  I don't know
when it would be called, but I hope the answer is provably never in
response to a message from an unprivileged user, in which case I'd
wonder why it's there.

The bus_creds_extend_by_pid is what I'm asking about.  When is it used?

In any event, under certain (strewn-across-multiple-files with
duplicated code in various places) conditions, all three paths above
seem to land in bus_creds_add_more.

It looks like some of this is gated on whether the sending PID is
known (IMO this is odd -- it should be gated by whether you think it's
a good idea, not whether a piece of information is known), but maybe
mac_selinux_generic_access_check can circumvent that.

So why exactly are capabilities only used with kdbus?  I don't see it,
and I do see code that tries to extract them from /proc.


 If so, I don't suppose any of you
 could give me an example of:

 $ cp `which dbus-send` .
 $ sudo setcap all=eip dbus-send
 $ dbus-send [not sure what goes here]

 that passes an authentication test that would have failed without the setcap?

 Note that dbus-send is the old dbus1 command, and will always go via
 the bus proxy that connects dbus1 clients to kdbus. As such, it will
 only carry the credentials that are available safely on dbus1, and
 hence not capabilities. So even if you do use kdbus on your machine,
 this example would not work.

 To be clear, the reason we do not use the capability logic on dbus1,
 but only on kdbus is as follows: On kdbus, the capabilities are
 attached to the method call itself, hence we safely know that at the
 time the method call was issued by the client side, that client
 actually really had those capabilities. On dbus1 however, which uses
 AF_UNIX/SOCK_STREAM, this is not possible: we only get the
 UID/GID/PID, hence we could only derive the caps from
 /proc/$PID/status. But this opens this up for a vulnerability: if a
 client quickly issues a method call, and then exec()s a suid binary,
 it could happen that the service side would read the capabilities from
 that SUID process, and not the original process, and the exploit was
 successful. We cannot allow that, hence no capabilities on dbus1,
 instead we open up things to a much, much broader uid == 0. kdbus
 allows us to be much stricter there, by allowing us to check only one
 specific capability.

Agreed, and that's not the only vulnerability.  But code to do it
certainly exists in systemd.


 An equivalent example to the one you gave using native kdbus would be:

 $ cp `which busctl` .
 $ sudo setcap all=eip busctl
 $ ./busctl call org.freedesktop.timedate1 /org/freedesktop/timedate1
 org.freedesktop.timedate1 SetTimezone sb Europe/London 0


I still don't see why, even if this were bug-free, it's even remotely
useful.  Keep in mind that basically no one has a capability if
they're not root, and dbus users should really use *dbus* mechanisms
to retain selected privileges when they drop other privileges.

--Andy

 In the interest of full disclosure, I'm asking because I think that
 one of two things is true:

 1. The SD_BUS_VTABLE_CAPABILITY code is useless and should therefore be 
 deleted.

 This is not the case.

 2. The SD_BUS_VTABLE_CAPABILITY code is exploitably buggy and should
 therefore be deleted.

 I have heard this claim from you many times, but so far I haven't been
 able to figure out what you have in mind. Could you either give an
 example of an exploit or at least the general scheme you

Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 8:59 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Thu, 16.04.15 07:52, Andy Lutomirski (l...@amacapital.net) wrote:

 I'm looking at sd_bus_query_sender_privilege, which does:

 r = sd_bus_query_sender_creds(call,
 SD_BUS_CREDS_UID|SD_BUS_CREDS_EUID|SD_BUS_CREDS_EFFECTIVE_CAPS,
 creds);

 That, in turn, does:

 if (!c || !(c-mask  SD_BUS_CREDS_PID)) {
 /* We couldn't read anything from the call, let's try
  * to get it from the sender or peer */

 if (call-sender)
 return sd_bus_get_name_creds(call-bus,
 call-sender, mask, creds);
 else
 return sd_bus_get_owner_creds(call-bus, mask, 
 creds);
 }

 return bus_creds_extend_by_pid(c, mask, creds);

 The sd_bus_get_name_creds call seems questionable to me -- isn't that
 trying to augment missing information from the time of call with
 metadata from the sender's connection?  (I really think this dichotomy
 needs to be removed, *especially* since it looks like code already
 exists to try to use both metadata sources.  This seems like it's just
 asking for security screw-ups.)

 This information is missing only for dbus1 clients connecting via the
 dbus1 compat proxy. In that case we need to fall back to connection
 credentials, because that's how authentication works on dbus1.

 The sd_bus_get_owner_creds call looks totally bogus.  I don't know
 when it would be called, but I hope the answer is provably never in
 response to a message from an unprivileged user, in which case I'd
 wonder why it's there.

 Messages with empty senders are only used for dbus1 *direct*
 connections. This is a special setup dbus1 supports where
 communication is not done via the bus daemon, but via a direct AF_UNIX
 connection between two peers. It's very seldom used, and mostly a hack
 to avoid the performance penalty that the dbus-daemon means. These
 direct connections are entirely redundant in a kdbus world, and you
 better shouldn't think about them.

 Ultimately, this code exists in sd-bus for one reason only: in a
 non-kdbus world systemd must be accessible even if dbus-daemon is not
 up. Since dbus-daemon is only started relatively late during boot
 there's a window in early boot where we need to be accessible by some
 other means: for that we allow privileged clients to connect to us
 directly. We then still speak the dbus protocol, and offer similar
 APIs, but we do not use the bus topology. In kdbus all of that goes
 away, since kdbus is available all the time, since earliest time of
 boot.

 For such direct dbus1 connections it's the creds of the peer of the
 dbus connection that we need to authenticate, that is exposed as
 owner creds.

 Or in other words, the if check above you can read like this:

 if (we_get_a_full_kdbus_message)
 use_the_attached_creds();
 else {
 if (we_get_a_compat_dbus1_message_via_kdbus_or_native_dbus1)
 use_the_senders_creds()
 else if (we_get_a_message_via_dbus_direct_connection())
 use_the_peer_creds();
 }

 To make this easier to understand I'll add some comments to this code:

 http://cgit.freedesktop.org/systemd/systemd/commit/?id=2d0c1561340efff3265fe89b05eae4ee8f4037a7

 The bus_creds_extend_by_pid is what I'm asking about.  When is it
 used?

 It's a noop, unless people OR in SD_BUS_CREDS_AUGMENT into the flags
 of creds they want. Doing this basically voids your warranty: it means
 that the creds data shall be augmented with data from /proc, which are
 good enough for logging, debugging things, but not for security
 relevant things, such as authorizing message calls.

OK, I finally found the relevant AUGMENT check.


 Our own message authorization call does not make use of this flag.


mac_selinux_generic_access_check does.  It also uses sender metadata,
which I find interesting given that you seem to strongly prefer using
message metadata.

 Also see the comment about this in sd-bus.h:

 http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h?id=2d0c1561340efff3265fe89b05eae4ee8f4037a7#n89

 Or check the man page for busctl that explains this too:

 http://www.freedesktop.org/software/systemd/man/busctl.html#--augment-creds=BOOL

 In any event, under certain (strewn-across-multiple-files with
 duplicated code in various places) conditions, all three paths above
 seem to land in bus_creds_add_more.

 It looks like some of this is gated on whether the sending PID is
 known (IMO this is odd -- it should be gated by whether you think it's
 a good idea, not whether a piece of information is known), but maybe
 mac_selinux_generic_access_check can circumvent that.

 Well, the augmentation code needs the PID to read the data from /proc,
 that's what it does after all...

 So why exactly are capabilities only used with kdbus?  I don't see it,
 and I do see code that tries to extract them from /proc

Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 9:43 AM, Tom Gundersen t...@jklm.no wrote:
 On Thu, Apr 16, 2015 at 4:52 PM, Andy Lutomirski l...@amacapital.net wrote:
 Unshare your user namespace, set things up right, and systemd
 or any other server will see you as having all capabilities.  You've
 fixed that in kdbus, but you haven't (and probably can't!) fix it in
 the legacy code, and that legacy code is still there (!).

 The dbus1 code (which I assume you mean when you say legacy code)
 does not make use of capabilities, and it should not (see Lennart's
 answer for all the details). If anything, this should be an argument
 to move to kdbus with native, race-free capability-passing support.

 Do I understand correctly, that any concerns you had are about systemd
 and its dbus1 compat code (which of course we should take seriously
 too), and that you no longer see any security vulnerabilities in the
 capability related code of kdbus?

 The ratio of complexity of capability code the kdbus folks have
 already written (hundreds of lines across multiple files) to its
 utility (very near zero AFAICT) is, in my book, not a good sign at
 all.

 We have several uses of this, see my mail to Jiri regarding
 CAP_SYS_BOOT for instance:
   https://lkml.org/lkml/2015/4/16/219

I read that, but I disagree with you.

CAP_SYS_BOOT is the privilege to directly hard-reboot the system, not
the privilege to initiate a clean reboot.

Keep in mind that, on some recent Windows versions, for the most part
you *can't* directly hard-reboot the system; instead you have to give
the OS a reason so the OS can log it.  IOW, the high-level Windows
reboot permission doesn't confer the privilege of directly
hard-rebooting.

Maybe systemd or GNOME will want to do that some day.


 However, what we are trying to get to the bottom of is if you see any
 technical problems with the current kdbus capability handling code. My
 understanding is that you don't.


I have a technical problem with it: it's a design that has
insufficient justification.  It also seems like it will be quite
limiting in the future, *especially* wrt user namespaces.

I agree that it's probably not exploitable *if used carefully* in the
latest kdbus code.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 10:43 AM, Tom Gundersen t...@jklm.no wrote:
 On Thu, Apr 16, 2015 at 5:57 PM, Andy Lutomirski l...@amacapital.net wrote:
 We have several uses of this, see my mail to Jiri regarding
 CAP_SYS_BOOT for instance:
   https://lkml.org/lkml/2015/4/16/219

 I read that, but I disagree with you.

 CAP_SYS_BOOT is the privilege to directly hard-reboot the system, not
 the privilege to initiate a clean reboot.

 My main point that being allowed to do a hard-reboot, but not to do a
 clean reboot, makes no sense.

 However, what we are trying to get to the bottom of is if you see any
 technical problems with the current kdbus capability handling code. My
 understanding is that you don't.


 I have a technical problem with it: it's a design that has
 insufficient justification.  It also seems like it will be quite
 limiting in the future, *especially* wrt user namespaces.

 What I was really asking was if you see any actual vulnerabilities.
 That we have a different technical opinion on the design of this is a
 different matter.

 I agree that it's probably not exploitable *if used carefully* in the
 latest kdbus code.

 It would be very helpful if you could go into details on why you think
 more care is needed here than for other things. Is there anything
 non-trivial going on here that I'm missing? The way capabilites are
 exposed through kdbus seems perfectly straight-forward to me, so I'm
 really trying to get my head around your concerns here.

 So, let me ask explicitly again:

 * Are there any actual exploitable vulnerabilities you are aware
 of in the kdbus kernel code?

 * Are there any actual exploitable vulnerabilities on the sd-bus
 userspace code you are aware of, regardless if in the kdbus or dbus1
 support in it?

 If you are, please provide us with good explanations, so that we can
 do something about them. We'd love to fix this!

I don't see a vulnerability in the code as such.  But you still
haven't given a real use case, and vulnerabilities in security
mechanisms are often found in the interactions between components.
There doesn't seem to be a big picture here.

Can you give a plausible real world example along the lines of
process A, which has capability B for some decent reason, will send
message C to daemon D, and D will permit the request because of
capability B, but the request would not otherwise have been
permitted.

So far it seems like there's a bunch of code that may not be
exploitable but doesn't really do anything either.

And, honestly, if this is only about rebooting and setting the time,
then, even if correct, this is seriously not worth the complexity even
if it's completely sensible and correct.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 10:30 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Thu, 16.04.15 09:53, Andy Lutomirski (l...@amacapital.net) wrote:

  It's a noop, unless people OR in SD_BUS_CREDS_AUGMENT into the flags
  of creds they want. Doing this basically voids your warranty: it means
  that the creds data shall be augmented with data from /proc, which are
  good enough for logging, debugging things, but not for security
  relevant things, such as authorizing message calls.

 OK, I finally found the relevant AUGMENT check.

 
  Our own message authorization call does not make use of this flag.
 

 mac_selinux_generic_access_check does.  It also uses sender metadata,
 which I find interesting given that you seem to strongly prefer using
 message metadata.

 It certainly does. But this is not actually used for authentication
 either, ly, because the auth code in there only cares for the
 selinux label, which is available as native connection metadata on
 dbus1 already. The augmentation is used because the selinux guys want to
 generate a nice log message when the permission is not granted.

 The selinux code in systemd was actually contributed by the selinux
 people a long time ago. It contained code to read data from /proc
 manually for showing in the messages. I recently reworked this to just
 use SD_BUS_CREDS_AUGMENT instead, so that reading data from /proc is
 at least unified at one place. This is safe only because we know that
 the selinux data is not augmented, but available anyway and safely,
 and we rely on that.

 You can take the selinux code in system as a hint, that getting
 metadata about bus requests that one can log about is nothign we made
 up, it's very real, and our selinux code added that years ago, and was
 written by the selinux guys, not us...

 I will grant you though that it is confusing that we use
 SD_BUS_CREDS_AUGMENT here like this, and implicitly rely on that the
 selinux label is not a field that is being augmented. We should make
 this explicit, absolutely. I'll now add some code that will make this
 assumption explicit and fails early if the selinux label happens to be
 augmented. Of course in real-life this is impossible to trigger, but
 it's certainly helps understanding the code.

  Agreed, and that's not the only vulnerability.  But code to do it
  certainly exists in systemd.
 
  Not the only vulnerability? I am not aware of any vulnerability, can
  you explain?

 See below and attached.

 
   $ cp `which busctl` .
   $ sudo setcap all=eip busctl
   $ ./busctl call org.freedesktop.timedate1 /org/freedesktop/timedate1
   org.freedesktop.timedate1 SetTimezone sb Europe/London 0
  
 
  I still don't see why, even if this were bug-free, it's even remotely
  useful.  Keep in mind that basically no one has a capability if
  they're not root, and dbus users should really use *dbus* mechanisms
  to retain selected privileges when they drop other privileges.
 
  Well, some of systemd's own deamons run with an uid != 0, but some
  caps left.
 
  And avahi (as a daemon I know very well, that is not part of systemd)
  does that too, since 10 years actually... It's not that uncommon among
  system daemons really.

 And does it use those caps for the purpose of authenticating to dbus
 services?

 No, it does not. Because it's unsafe to do that on dbus1. As mentioned
 before, kdbus makes this safe.

 If so, perhaps you should reconsider allowing open dbus connections to
 act as capabilities (the real kind, not the Linux capability crap)
 themselves.

 No, they do not use capabilities for bus authentication, since they
 predate kdbus, and acquiring them on dbus1/af_unix is not safe.

  Sure.  Unshare your user namespace, set things up right, and systemd
  or any other server will see you as having all capabilities.  You've
  fixed that in kdbus, but you haven't (and probably can't!) fix it in
  the legacy code, and that legacy code is still there (!).  (Actually,
  that code seems to have been actively developed as recently as
  November 2014.)
 
  Hmm? I do not understand what you are referring to.

 See the attached code.  If you LD_PRELOAD it and send a message that
 causes the receiver to read CapEff, then *boom*.  So either:

 a) all the code to read CapEff in the systemd tree doesn't do anything, or
 b) it's wrong, or
 c) it's only used for diagnostics

 As an illustration, LD_PRELOAD the attachment into sleep 1h, then
 cat /proc/`pidof sleep`/status.  Note the rather large CapEff value :)
  Again, this has nothing to do with kdbus, and races aren't needed.

 I don't see how this is a vulnerability. Reading stuff from /proc is
 unsafe. Hence we do not use it for authentication. We all know that. I
 am really not sure what you are trying to prove here.

  Can you please explain how precisely you think that sd-bus or systemd
  or the way they use capabilities is exploitable in any way? You keep
  claiming that, but I never have seen more than vague words about

Re: [systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-16 Thread Andy Lutomirski
On Thu, Apr 16, 2015 at 11:23 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Thu, 16.04.15 10:52, Andy Lutomirski (l...@amacapital.net) wrote:

 
  It would be very helpful if you could go into details on why you think
  more care is needed here than for other things. Is there anything
  non-trivial going on here that I'm missing? The way capabilites are
  exposed through kdbus seems perfectly straight-forward to me, so I'm
  really trying to get my head around your concerns here.
 
  So, let me ask explicitly again:
 
  * Are there any actual exploitable vulnerabilities you are aware
  of in the kdbus kernel code?
 
  * Are there any actual exploitable vulnerabilities on the sd-bus
  userspace code you are aware of, regardless if in the kdbus or dbus1
  support in it?
 
  If you are, please provide us with good explanations, so that we can
  do something about them. We'd love to fix this!

 I don't see a vulnerability in the code as such.  But you still
 haven't given a real use case, and vulnerabilities in security
 mechanisms are often found in the interactions between components.
 There doesn't seem to be a big picture here.

 Can you give a plausible real world example along the lines of
 process A, which has capability B for some decent reason, will send
 message C to daemon D, and D will permit the request because of
 capability B, but the request would not otherwise have been
 permitted.

 So far it seems like there's a bunch of code that may not be
 exploitable but doesn't really do anything either.

 And, honestly, if this is only about rebooting and setting the time,
 then, even if correct, this is seriously not worth the complexity even
 if it's completely sensible and correct.

 Well, you asked for examples, I gave you two simple, existing
 ones.

 Note that the caps-based method authorization is not much used
 currently, especially not outside of systemd, because it's not safely
 doable on dbus1, but only on kdbus, and kdbus is not merged into the
 kernel yet, hence not many projects use it yet.

 But anyway, here are two more example in systemd itself:

 systemd itself checks CAP_SYS_KILL for clients asking to kill
 arbitrary services (which means invoking kill() to all PIDs in the
 service's cgroup).

 Similar to this, logind checks CAP_SYS_KILL for clients asking to kill
 sessions (which means invoking kill() for all PIDS in the session).


These aren't the kinds of examples I'm asking about.  They're what
systemd does, not what use case makes doing that serve a purpose..

 Now, to put together a more complex scenario for you: consider a small
 web UI that can be used to set the system time. It should realy run at
 minimal privileges, after all it has a surface to the web. Hence you
 write it as daemon, make it run as a user id of its own, set up a
 chroot() or a file system namespace, but you do keep CAP_SYS_TIME and
 a bus connection open.

Aha, a real example!

But this is a bad example -- systemd should provide the web server
with a way to retain the right to issue the one method call that it
needs.  CAP_SYS_TIME is overbroad.  For example, CAP_SYS_TIME also
grants the service to bypass whatever auditing and logging systemd
would do.

And this is also a worthless design pattern, as it seems to be
applicable to sending signals and changing the time, and probably very
little else.

 With that setup the web daemon can pretty much
 only set the system clock, and if it exploited cannot be used for much
 else. It will not have access to /dev/rtc, due to the file system
 namespace, but it can nicely set the system clock via timedated still,
 and pretty much only that, and the clock will be synced to the RTC by
 it.

It can set the time with settimeofday(2), which is undesirable.


 To map this back to your earlier request for an example. In this case
 process A is the web UI daemon. Capability B is CAP_SYS_TIME. Message
 C is the SetTime() bus call. Daemon D is timedated.

 If the web UI daemon would not have CAP_SYS_TIME it couldn't change
 the time like this -- unless of course that web UI daemon would run as
 UID 0 all the time, which is certainly much much less desirable, given
 that it is a network facing daemon.

 And if you are not happy with this example, there are numerous others
 like this of course...

Honestly, I doubt that.  There are a smallish number of capabilities
and a much smaller number that make sense in this manner.

Here's what I think is going on: if I propose a kernel patch to change
the kernel to let my app do such-and-such, and I convince everyone
involved that the patch is bug-free, maintainable, cleanly written,
secure, etc, that's still not enough justification to merge it,
especially if it's an ABI change.  I also need to justify why it's a
good idea for my app to do whatever I'm enabling and why the kernel
should support it in that way.

AFAICT this piece of kdbus code serves to enable a rather odd way to
write privilege-separated services

[systemd-devel] SD_BUS_VTABLE_CAPABILITY

2015-04-15 Thread Andy Lutomirski
Hi all-

Yesterday, I discovered SD_BUS_VTABLE_CAPABILITY.  Are there any
examples in which it does anything?  If so, I don't suppose any of you
could give me an example of:

$ cp `which dbus-send` .
$ sudo setcap all=eip dbus-send
$ dbus-send [not sure what goes here]

that passes an authentication test that would have failed without the setcap?

In the interest of full disclosure, I'm asking because I think that
one of two things is true:

1. The SD_BUS_VTABLE_CAPABILITY code is useless and should therefore be deleted.

2. The SD_BUS_VTABLE_CAPABILITY code is exploitably buggy and should
therefore be deleted.

I can't tell which one, since I haven't figured out how to test it
realistically in the first place.  Most of the protected calls seem to
be heavily restricted by dbus policy.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering
 lenn...@poettering.net wrote:
 On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote:

 Hi all-

 When running virtme (a simple vm gadget) on Fedora 21, the slowest
 part of bootup by far appears to be systemd-vconsole-setup:

 # time /usr/lib/systemd/systemd-vconsole-setup
 putfont: PIO_FONT trying ...
 ...
 setfont: putfont: 512,8x16:  failed: -1
 putfont: PIO_FONT: Invalid argument
 /usr/bin/setfont failed with error code 71.

 setfont is not part of systemd, we just invoke it. If that fails, this
 is a problem somewhere between the VM, the kernel and console-tools.


 Aha -- I missed that systemd-vconsole-setup calls setfont.  I can
 trigger the same problem by just typing setfont.  For whatever reason,
 my other Fedora 21 computer only has this problem if I type setfont
 and not if I run systemd-vconcole-setup.

 My uneducated guess is that your virtual machine boots up with a
 non-graphical console, and the tool thus tries to upload the fonts
 into the good old VGA hw text mode glyph tables, and qemu is very slow
 at that... Or something like that.

 setfont is doing this:

 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1
 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1
 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1
 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1
 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1

 This thing has only a serial console:

 # cat /proc/consoles
 ttyS0-W- (EC   a)4:64

 setfont does this:

 /* we allow ourselves to hang here for ca 5 seconds, xdm may
 be playing tricks on us. */
 while ((loop++  20)  (i = ioctl(fd, PIO_FONT, buf)))
   {
 if (loop = 1)
   fprintf(stderr, putfont: PIO_FONT trying ...\n);
 else
   fprintf(stderr, .);
 usleep(25);
   }
 fprintf(stderr, \n);

 Alexey, would it make sense to remove this loop or to add a way to turn it 
 off?

Ping, everyone?

This issue still exists.  AFAICT systemd is relying on a really old
tool, that that really old tool (setfont) is sometimes delaying boot
by a very large amount.  Can we either fix the tool (Alexey) or stop
using it (systemd people)?

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Wed, Apr 1, 2015 at 12:32 PM, Kay Sievers k...@vrfy.org wrote:
 On Wed, Apr 1, 2015 at 8:56 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering
 lenn...@poettering.net wrote:
 On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote:

 Hi all-

 When running virtme (a simple vm gadget) on Fedora 21, the slowest
 part of bootup by far appears to be systemd-vconsole-setup:

 # time /usr/lib/systemd/systemd-vconsole-setup
 putfont: PIO_FONT trying ...
 ...
 setfont: putfont: 512,8x16:  failed: -1
 putfont: PIO_FONT: Invalid argument
 /usr/bin/setfont failed with error code 71.

 setfont is not part of systemd, we just invoke it. If that fails, this
 is a problem somewhere between the VM, the kernel and console-tools.


 Aha -- I missed that systemd-vconsole-setup calls setfont.  I can
 trigger the same problem by just typing setfont.  For whatever reason,
 my other Fedora 21 computer only has this problem if I type setfont
 and not if I run systemd-vconcole-setup.

 My uneducated guess is that your virtual machine boots up with a
 non-graphical console, and the tool thus tries to upload the fonts
 into the good old VGA hw text mode glyph tables, and qemu is very slow
 at that... Or something like that.

 setfont is doing this:

 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1
 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1
 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1
 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1
 nanosleep({0, 25000}, NULL) = 0
 ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
 write(2, ., 1.)= 1

 This thing has only a serial console:

 # cat /proc/consoles
 ttyS0-W- (EC   a)4:64

 setfont does this:

 /* we allow ourselves to hang here for ca 5 seconds, xdm may
 be playing tricks on us. */
 while ((loop++  20)  (i = ioctl(fd, PIO_FONT, buf)))
   {
 if (loop = 1)
   fprintf(stderr, putfont: PIO_FONT trying ...\n);
 else
   fprintf(stderr, .);
 usleep(25);
   }
 fprintf(stderr, \n);

 Alexey, would it make sense to remove this loop or to add a way to turn it 
 off?

 Ping, everyone?

 This issue still exists.  AFAICT systemd is relying on a really old
 tool, that that really old tool (setfont) is sometimes delaying boot
 by a very large amount.  Can we either fix the tool (Alexey) or stop
 using it (systemd people)?

 Hmm, why is the vm gadget you run configuring a custom console font
 at all? If there is no custom font specified in t he config, systemd
 will not run setfont.

It's not intentionally configuring a custom font, but it might be
inheriting Fedora's settings.


 Or did you mean to have vconsole-setup detect that it should not even
 try to run setfont? Not sure how to find that out.

 I don't really see how vconsole-setup could get rid of calling setfont
 from systemd, it is needed in many setups.

vconsole-setup could set the font itself instead of using setfont if
setfont can't be configured or fixed not to keep retrying for five
seconds (!).

Ideally, I think that setfont would just stop retrying on failure.  Or
perhaps all of this could go through udev or some other mechanism that
doesn't try to set the font until the device actually exists.  But the
console system is weird and may be that's hard.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Wed, Apr 1, 2015 at 2:36 PM, Kay Sievers k...@vrfy.org wrote:
 On Wed, Apr 1, 2015 at 11:19 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Wed, Apr 1, 2015 at 1:53 PM, Kay Sievers k...@vrfy.org wrote:
 On Wed, Apr 1, 2015 at 10:45 PM, Andy Lutomirski l...@amacapital.net 
 wrote:
 On Apr 1, 2015 12:56 PM, Kay Sievers k...@vrfy.org wrote:

 Do you have an idea why the VM does not accept the custom font? If
 that is something obvious, and we can detect it, we could make
 vconsole-setup check for it. But then again, fixing setfont seems like
 the obvious fix here.

 I assume it's because the VM has no graphical console at all.

 We check the existence of the corresponding /dev/vcs%i, to check if
 the tty is allocated where we want to apply the font to. Do these
 devices exist on the running machine?

 Yes:

 # ls /dev/vcs*
 /dev/vcs   /dev/vcs2  /dev/vcs4  /dev/vcsa1  /dev/vcsa3
 /dev/vcs1  /dev/vcs3  /dev/vcsa  /dev/vcsa2  /dev/vcsa4

 Looking at the code, the vc_screen.c code seems to create those
 devices unconditionally.

 They should only get created when something accesses the corresponding
 tty. deallocvt(1) can kill unused ones and the device nodes should
 disappear.


deallocvt doesn't seem to kill those device nodes for me.

 And what does this say?
   grep . /sys/class/tty/tty0/active /sys/class/tty/console/active

 # grep . /sys/class/tty/tty0/active /sys/class/tty/console/active
 /sys/class/tty/tty0/active:tty1
 /sys/class/tty/console/active:ttyS0

 vcs1 has, roughly:

 early console in decompress_kernel
 Decompressing Linux... Parsing ELF... done.
 Booting the kernel.

 Now I'm wondering how that buffer came to be.

 In any event, some tracing of the code suggests that I have
 vga_video_type == VIDEO_TYPE_CGA, and that fails if (vga_video_type 
 VIDEO_TYPE_EGAM) in vgacon_font_set.

 Indeed, /proc/ioports has:

   03d4-03d5 : cga

 and dmesg says:

 [0.00] Console: colour *CGA 80x25

 I don't see this information in sysfs anywhere.  Perhaps checking for
 an active console and detecting -EINVAL from vgacon_font_get would
 work.

 Hmm, yeah, maybe we could try one of the font-related ioctls to find
 out if the driver supports that before we spawn setfont.

 /proc/fb is empty on this VM, so maybe that would help.  Grr, this
 stuff is really old and crufty.

 The offending qemu command line args appear to be -vga none -display
 none.  I assume I have CGA because it's the fallback case in
 vgacon.c if nothing matches.

 Hehe, blast from the past. :) If you give kvm a VGA device, it all works fine?

I just tried it.  setfont succeeds, and the VGA device matches
/dev/vcs's contents.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Wed, Apr 1, 2015 at 2:47 PM, Kay Sievers k...@vrfy.org wrote:
 On Wed, Apr 1, 2015 at 11:38 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Wed, Apr 1, 2015 at 2:36 PM, Kay Sievers k...@vrfy.org wrote:

 They should only get created when something accesses the corresponding
 tty. deallocvt(1) can kill unused ones and the device nodes should
 disappear.


 deallocvt doesn't seem to kill those device nodes for me.

 Seems to work here:

 # ls -l /dev/vcs[6789]
 crw-rw 1 root tty 7, 6 Apr  1 22:21 /dev/vcs6
 # cat /dev/tty7
 ^C
 # cat /dev/tty9
 ^C
 # ls -l /dev/vcs[6789]
 crw-rw 1 root tty 7, 6 Apr  1 22:21 /dev/vcs6
 crw-rw 1 root tty 7, 7 Apr  1 23:42 /dev/vcs7
 crw-rw 1 root tty 7, 9 Apr  1 23:42 /dev/vcs9
 # deallocvt 7
 # ls -l /dev/vcs[6789]
 crw-rw 1 root tty 7, 6 Apr  1 22:21 /dev/vcs6
 crw-rw 1 root tty 7, 9 Apr  1 23:42 /dev/vcs9
 # deallocvt 9
 # ls -l /dev/vcs[6789]
 crw-rw 1 root tty 7, 6 Apr  1 22:21 /dev/vcs6

Aha.  It seems that I have something holding tty1-tty4 open.  I'll fix
that on my end.  Will that make vconsole-setup stop calling setfont?
If so, that will indirectly solve my problem.  (Although... I don't
see why the presence or absence of in-use vts should affect font
loading.  Also, it seems like vcs1 shows up no matter what I do.)


 The offending qemu command line args appear to be -vga none -display
 none.  I assume I have CGA because it's the fallback case in
 vgacon.c if nothing matches.

 Hehe, blast from the past. :) If you give kvm a VGA device, it all works 
 fine?

 I just tried it.  setfont succeeds, and the VGA device matches
 /dev/vcs's contents.

 Ah, nice.

 If we figure out some dummy font-related call to check if the kernel
 supports font handling at all, we could just add that to
 vconsole-setup, I guess.

 Kay



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Wed, Apr 1, 2015 at 1:53 PM, Kay Sievers k...@vrfy.org wrote:
 On Wed, Apr 1, 2015 at 10:45 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Apr 1, 2015 12:56 PM, Kay Sievers k...@vrfy.org wrote:

 Do you have an idea why the VM does not accept the custom font? If
 that is something obvious, and we can detect it, we could make
 vconsole-setup check for it. But then again, fixing setfont seems like
 the obvious fix here.

 I assume it's because the VM has no graphical console at all.

 We check the existence of the corresponding /dev/vcs%i, to check if
 the tty is allocated where we want to apply the font to. Do these
 devices exist on the running machine?

Yes:

# ls /dev/vcs*
/dev/vcs   /dev/vcs2  /dev/vcs4  /dev/vcsa1  /dev/vcsa3
/dev/vcs1  /dev/vcs3  /dev/vcsa  /dev/vcsa2  /dev/vcsa4

Looking at the code, the vc_screen.c code seems to create those
devices unconditionally.


 And what does this say?
   grep . /sys/class/tty/tty0/active /sys/class/tty/console/active

# grep . /sys/class/tty/tty0/active /sys/class/tty/console/active
/sys/class/tty/tty0/active:tty1
/sys/class/tty/console/active:ttyS0

vcs1 has, roughly:

early console in decompress_kernel
Decompressing Linux... Parsing ELF... done.
Booting the kernel.

Now I'm wondering how that buffer came to be.

In any event, some tracing of the code suggests that I have
vga_video_type == VIDEO_TYPE_CGA, and that fails if (vga_video_type 
VIDEO_TYPE_EGAM) in vgacon_font_set.

Indeed, /proc/ioports has:

  03d4-03d5 : cga

and dmesg says:

[0.00] Console: colour *CGA 80x25

I don't see this information in sysfs anywhere.  Perhaps checking for
an active console and detecting -EINVAL from vgacon_font_get would
work.

/proc/fb is empty on this VM, so maybe that would help.  Grr, this
stuff is really old and crufty.

The offending qemu command line args appear to be -vga none -display
none.  I assume I have CGA because it's the fallback case in
vgacon.c if nothing matches.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-04-01 Thread Andy Lutomirski
On Apr 1, 2015 12:56 PM, Kay Sievers k...@vrfy.org wrote:

 On Wed, Apr 1, 2015 at 9:36 PM, Andy Lutomirski l...@amacapital.net wrote:
  On Wed, Apr 1, 2015 at 12:32 PM, Kay Sievers k...@vrfy.org wrote:
  On Wed, Apr 1, 2015 at 8:56 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  On Thu, Jan 22, 2015 at 6:29 PM, Andy Lutomirski l...@amacapital.net 
  wrote:
  On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering
  lenn...@poettering.net wrote:
  On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote:
 
  Hi all-
 
  When running virtme (a simple vm gadget) on Fedora 21, the slowest
  part of bootup by far appears to be systemd-vconsole-setup:
 
  # time /usr/lib/systemd/systemd-vconsole-setup
  putfont: PIO_FONT trying ...
  ...
  setfont: putfont: 512,8x16:  failed: -1
  putfont: PIO_FONT: Invalid argument
  /usr/bin/setfont failed with error code 71.
 
  setfont is not part of systemd, we just invoke it. If that fails, this
  is a problem somewhere between the VM, the kernel and console-tools.
 
 
  Aha -- I missed that systemd-vconsole-setup calls setfont.  I can
  trigger the same problem by just typing setfont.  For whatever reason,
  my other Fedora 21 computer only has this problem if I type setfont
  and not if I run systemd-vconcole-setup.
 
  My uneducated guess is that your virtual machine boots up with a
  non-graphical console, and the tool thus tries to upload the fonts
  into the good old VGA hw text mode glyph tables, and qemu is very slow
  at that... Or something like that.
 
  setfont is doing this:
 
  nanosleep({0, 25000}, NULL) = 0
  ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
  write(2, ., 1.)= 1
  nanosleep({0, 25000}, NULL) = 0
  ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
  write(2, ., 1.)= 1
  nanosleep({0, 25000}, NULL) = 0
  ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
  write(2, ., 1.)= 1
  nanosleep({0, 25000}, NULL) = 0
  ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
  write(2, ., 1.)= 1
  nanosleep({0, 25000}, NULL) = 0
  ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
  write(2, ., 1.)= 1
 
  This thing has only a serial console:
 
  # cat /proc/consoles
  ttyS0-W- (EC   a)4:64
 
  setfont does this:
 
  /* we allow ourselves to hang here for ca 5 seconds, xdm may
  be playing tricks on us. */
  while ((loop++  20)  (i = ioctl(fd, PIO_FONT, buf)))
{
  if (loop = 1)
fprintf(stderr, putfont: PIO_FONT trying ...\n);
  else
fprintf(stderr, .);
  usleep(25);
}
  fprintf(stderr, \n);
 
  Alexey, would it make sense to remove this loop or to add a way to turn 
  it off?
 
  Ping, everyone?
 
  This issue still exists.  AFAICT systemd is relying on a really old
  tool, that that really old tool (setfont) is sometimes delaying boot
  by a very large amount.  Can we either fix the tool (Alexey) or stop
  using it (systemd people)?
 
  Hmm, why is the vm gadget you run configuring a custom console font
  at all? If there is no custom font specified in t he config, systemd
  will not run setfont.
 
  It's not intentionally configuring a custom font, but it might be
  inheriting Fedora's settings.

 Ideally, /etc/vconsole.conf does not even exist in a default setup. It
 is only needed for foreign language keyboard support or more exotic
 font requirements.

  Or did you mean to have vconsole-setup detect that it should not even
  try to run setfont? Not sure how to find that out.
 
  I don't really see how vconsole-setup could get rid of calling setfont
  from systemd, it is needed in many setups.
 
  vconsole-setup could set the font itself instead of using setfont if
  setfont can't be configured or fixed not to keep retrying for five
  seconds (!).

 It is a rather complex logic which would need to be duplicated in
 systemd. We so far have avoided it, because the kernel VC font and
 keymaps are so conceptually limited, that it does not really make
 sense to build a modern system on top of it.

 If systemd gets advanvced console support with systemd-consoled,
 we need full unicode support, high-dpi display support, display hotplug,
 ..., all things the kernel's vc stuff will never give us. That is why we rely
 on setfont and loadkeys for now.

  Ideally, I think that setfont would just stop retrying on failure.

 Right, that sounds like a simple and sensible fix.

Too bad no one from the kbd list has replied :(


  Or
  perhaps all of this could go through udev or some other mechanism that
  doesn't try to set the font until the device actually exists.  But the
  console system is weird and may be that's hard

Re: [systemd-devel] systemd-vconsole-setup fails very slowly

2015-01-22 Thread Andy Lutomirski
On Thu, Jan 22, 2015 at 6:13 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Wed, 21.01.15 19:15, Andy Lutomirski (l...@amacapital.net) wrote:

 Hi all-

 When running virtme (a simple vm gadget) on Fedora 21, the slowest
 part of bootup by far appears to be systemd-vconsole-setup:

 # time /usr/lib/systemd/systemd-vconsole-setup
 putfont: PIO_FONT trying ...
 ...
 setfont: putfont: 512,8x16:  failed: -1
 putfont: PIO_FONT: Invalid argument
 /usr/bin/setfont failed with error code 71.

 setfont is not part of systemd, we just invoke it. If that fails, this
 is a problem somewhere between the VM, the kernel and console-tools.


Aha -- I missed that systemd-vconsole-setup calls setfont.  I can
trigger the same problem by just typing setfont.  For whatever reason,
my other Fedora 21 computer only has this problem if I type setfont
and not if I run systemd-vconcole-setup.

 My uneducated guess is that your virtual machine boots up with a
 non-graphical console, and the tool thus tries to upload the fonts
 into the good old VGA hw text mode glyph tables, and qemu is very slow
 at that... Or something like that.

setfont is doing this:

nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ., 1.)= 1
nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ., 1.)= 1
nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ., 1.)= 1
nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ., 1.)= 1
nanosleep({0, 25000}, NULL) = 0
ioctl(3, PIO_FONT, 0xfbc010)= -1 EINVAL (Invalid argument)
write(2, ., 1.)= 1

This thing has only a serial console:

# cat /proc/consoles
ttyS0-W- (EC   a)4:64

setfont does this:

/* we allow ourselves to hang here for ca 5 seconds, xdm may
be playing tricks on us. */
while ((loop++  20)  (i = ioctl(fd, PIO_FONT, buf)))
  {
if (loop = 1)
  fprintf(stderr, putfont: PIO_FONT trying ...\n);
else
  fprintf(stderr, .);
usleep(25);
  }
fprintf(stderr, \n);

Alexey, would it make sense to remove this loop or to add a way to turn it off?

Thanks,
Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd-vconsole-setup fails very slowly

2015-01-21 Thread Andy Lutomirski
Hi all-

When running virtme (a simple vm gadget) on Fedora 21, the slowest
part of bootup by far appears to be systemd-vconsole-setup:

# time /usr/lib/systemd/systemd-vconsole-setup
putfont: PIO_FONT trying ...
...
setfont: putfont: 512,8x16:  failed: -1
putfont: PIO_FONT: Invalid argument
/usr/bin/setfont failed with error code 71.

real0m5.361s
user0m0.079s
sys0m0.093s

I have no idea what it's trying to do, why it thinks it should work,
or why it doesn't work, but can it be updated to fail quickly.  The VM
in question has no graphics:

# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem
00:03.0 System peripheral: Intel Corporation 6300ESB Watchdog Timer

To reproduce on Fedora 21:

$ sudo dnf install virtme
$ virtme-run --installed-kernel

Then twiddle your thumbs for ~6.3 seconds instead of the ~1.3 seconds
of thumb twiddling that would be required AFAICT without this issue.

Thanks,
Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-12-10 Thread Andy Lutomirski
On Tue, Dec 9, 2014 at 12:46 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Mon, Nov 3, 2014 at 12:41 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina jkos...@suse.cz wrote:
 On Mon, 3 Nov 2014, David Herrmann wrote:

  Agreed, mostly.  My only real concern is that this could be annoying
  for the userspace developers who will need to target Linux and HIDAPI
  separately.  Admittedly the Linux support will be trivial.

 I see. I'll not stop you from using hidraw, I'd just not recommend it.
 Especially for security stuff I dislike exposing the whole HID device
 writable. But yeah, I guess you got my point.

 Alright, I am basically thinking loudly now, but how about we allow HID
 drivers that would override default hidraw interface?

 I am very well aware of the fact that this could be opening a can of
 worms, so we'll have to make it very restrictive. How about, let's say, we
 allow HID drivers to provide custom hidraw interface (completely
 overriding the one that HID core would normally create) only for cases
 such as:

 - the intent is to expose only certain parts of a combined device
 - the intent is to introduce some level of access control

 I.e. still no interference of kernel with data parsing allowed.

 Hmm.  Would this be like a filter on hidraw actions?

 How would udev distinguish these special hidraw devices from normal
 hidraw devices?

 Also, for U2F, this could be a little awkward.  There's some crazy
 fragmentation stuff to allow a U2F request to be split across HID
 requests, and I think a kernel driver would much rather get the
 original unfragmented application request.


 I started writing a driver for this.  I got enumeration working.  I
 assume I should create a u2f device class, and then... ?

 Where am I supposed to get my character device from?  Do I register my
 own chrdev major?  Do I use misc?  Is there some input thing I'm
 supposed to use?

Another question:

I'm hitting this in hid_input_report:

if (down_trylock(hid-driver_input_lock)) {
pr_err(HID: trylock failed\n);  // I added this
return -EBUSY;
}

This is a problem for u2f: u2f reports are actual protocol messages,
and there isn't a retransmit mechanism.  Losing messages randomly
causes the handshake to fail, and then nothing works.

I *think* that the only way I can hit that failure is during probe (or
if the USB stack somehow completes two transfers at once). This still
makes it quite awkward to start IO from the probe routine.

I could start a work item to take driver_input_lock, release it again,
and then start the handshake, but that sucks.

Ideas?

--Andy


 Thanks,
 Andy

 --Andy


  I can give the driver a try.  It'll actually be simpler than the spec
  makes it out, since a real kernel driver will have no need to
  arbitrate with itself.

 I'll gladly review any custom HID drivers. Feel free to put me on CC
 when sending to linux-input. There's also a lot of examples in
 drivers/hid/ in case you need a head-start (especially if you need the
 raw_event callback).

 Same here, of course.

 Please always CC me in parallel to sending to linux-input@ to make sure
 that the patch doesn't fall in between cracks.

 Thanks,

 --
 Jiri Kosina
 SUSE Labs



 --
 Andy Lutomirski
 AMA Capital Management, LLC



 --
 Andy Lutomirski
 AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-12-09 Thread Andy Lutomirski
On Mon, Nov 3, 2014 at 12:41 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina jkos...@suse.cz wrote:
 On Mon, 3 Nov 2014, David Herrmann wrote:

  Agreed, mostly.  My only real concern is that this could be annoying
  for the userspace developers who will need to target Linux and HIDAPI
  separately.  Admittedly the Linux support will be trivial.

 I see. I'll not stop you from using hidraw, I'd just not recommend it.
 Especially for security stuff I dislike exposing the whole HID device
 writable. But yeah, I guess you got my point.

 Alright, I am basically thinking loudly now, but how about we allow HID
 drivers that would override default hidraw interface?

 I am very well aware of the fact that this could be opening a can of
 worms, so we'll have to make it very restrictive. How about, let's say, we
 allow HID drivers to provide custom hidraw interface (completely
 overriding the one that HID core would normally create) only for cases
 such as:

 - the intent is to expose only certain parts of a combined device
 - the intent is to introduce some level of access control

 I.e. still no interference of kernel with data parsing allowed.

 Hmm.  Would this be like a filter on hidraw actions?

 How would udev distinguish these special hidraw devices from normal
 hidraw devices?

 Also, for U2F, this could be a little awkward.  There's some crazy
 fragmentation stuff to allow a U2F request to be split across HID
 requests, and I think a kernel driver would much rather get the
 original unfragmented application request.


I started writing a driver for this.  I got enumeration working.  I
assume I should create a u2f device class, and then... ?

Where am I supposed to get my character device from?  Do I register my
own chrdev major?  Do I use misc?  Is there some input thing I'm
supposed to use?

Thanks,
Andy

 --Andy


  I can give the driver a try.  It'll actually be simpler than the spec
  makes it out, since a real kernel driver will have no need to
  arbitrate with itself.

 I'll gladly review any custom HID drivers. Feel free to put me on CC
 when sending to linux-input. There's also a lot of examples in
 drivers/hid/ in case you need a head-start (especially if you need the
 raw_event callback).

 Same here, of course.

 Please always CC me in parallel to sending to linux-input@ to make sure
 that the patch doesn't fall in between cracks.

 Thanks,

 --
 Jiri Kosina
 SUSE Labs



 --
 Andy Lutomirski
 AMA Capital Management, LLC



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-12-01 Thread Andy Lutomirski
On Sun, Nov 30, 2014 at 11:03 PM, Konstantin Khlebnikov
koc...@gmail.com wrote:
 Hmm. What about per-task/thread UUID? exported via separate file: 
 /proc/PID/uuid
 It could be created at the first access, thus this wouldn't shlowdown clone().
 Also it could be droped at execve(), so it'll describe execution
 context more precisely than pid.


I'd be all for this if it weren't for two issues:

1. This thing needs to work as soon as init is started, and we can't
reliably generate random numbers that early.

2. Migration / CRIU is rather fundamentally at odds with making
anything universally unique, as opposed to just per-user-ns.

So, given that I don't think we can actually provide a universally
unique pid-like thing, I'd rather not even try.

That being said, I want to rework this a little bit.  I think that
highpid should be restricted to the range 2^32 through 2^64 - 4096.
The former prevents anyone from confusing highpid with regular pid,
and the latter means that we don't need to worry about confusion
between errors and valid highpids (e.g. -1 will never be a highpid).

Implementing that will be only mildly annoying.

--Andy

 On Sat, Nov 29, 2014 at 2:05 AM, Andy Lutomirski l...@amacapital.net wrote:
 Pid reuse is common, which means that it's difficult or impossible
 to read information about a pid from /proc without races.

 This introduces a second number associated with each (task, pidns)
 pair called highpid.  Highpid is a 64-bit number, and, barring
 extremely unlikely circumstances or outright error, a (highpid, pid)
 will never be reused.

 With just this change, a program can open /proc/PID/status, read the
 Highpid field, and confirm that it has the expected value.  If the
 pid has been reused, then highpid will be different.

 The initial implementation is straightforward: highpid is simply a
 64-bit counter. If a high-end system can fork every 3 ns (which
 would be amazing, given that just allocating a pid requires at
 atomic operation), it would take well over 1000 years for highpid to
 wrap.

 For CRIU's benefit, the next highpid can be set by a privileged
 user.

 NB: The sysctl stuff only works on 64-bit systems.  If the approach
 looks good, I'll fix that somehow.

 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---

 If this goes in, there's plenty of room to add new interfaces to
 make this more useful.  For example, we could add a fancier tgkill
 that adds and validates hightgid and highpid, and we might want to
 add a syscall to read one's own hightgid and highpid.  These would
 be quite useful for pidfiles.

 David, would this be useful for kdbus?

 CRIU people: will this be unduly difficult to support in CRIU?

 If you all think this is a good idea, I'll fix the sysctl stuff and
 document it.  It might also be worth adding Hightgid to status.

  fs/proc/array.c   |  5 -
  include/linux/pid.h   |  2 ++
  include/linux/pid_namespace.h |  1 +
  kernel/pid.c  | 19 +++
  kernel/pid_namespace.c| 22 ++
  5 files changed, 44 insertions(+), 5 deletions(-)

 diff --git a/fs/proc/array.c b/fs/proc/array.c
 index cd3653e4f35c..f1e0e69d19f9 100644
 --- a/fs/proc/array.c
 +++ b/fs/proc/array.c
 @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct 
 pid_namespace *ns,
 int g;
 struct fdtable *fdt = NULL;
 const struct cred *cred;
 +   const struct upid *upid;
 pid_t ppid, tpid;

 rcu_read_lock();
 @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, 
 struct pid_namespace *ns,
 if (tracer)
 tpid = task_pid_nr_ns(tracer, ns);
 }
 +   upid = pid_upid_ns(pid, ns);
 cred = get_task_cred(p);
 seq_printf(m,
 State:\t%s\n
 Tgid:\t%d\n
 Ngid:\t%d\n
 Pid:\t%d\n
 +   Highpid:\t%llu\n
 PPid:\t%d\n
 TracerPid:\t%d\n
 Uid:\t%d\t%d\t%d\t%d\n
 @@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct 
 pid_namespace *ns,
 get_task_state(p),
 task_tgid_nr_ns(p, ns),
 task_numa_group_id(p),
 -   pid_nr_ns(pid, ns),
 +   upid ? upid-nr : 0, upid ? upid-highnr : 0,
 ppid, tpid,
 from_kuid_munged(user_ns, cred-uid),
 from_kuid_munged(user_ns, cred-euid),
 diff --git a/include/linux/pid.h b/include/linux/pid.h
 index 23705a53abba..ece70b64d04c 100644
 --- a/include/linux/pid.h
 +++ b/include/linux/pid.h
 @@ -51,6 +51,7 @@ struct upid {
 /* Try to keep pid_chain in the same cacheline as nr for find_vpid */
 int nr;
 struct pid_namespace *ns;
 +   u64 highnr;
 struct hlist_node pid_chain;
  };

 @@ -170,6 +171,7 @@ static inline pid_t pid_nr(struct pid *pid)
  }

  pid_t pid_nr_ns

Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-12-01 Thread Andy Lutomirski
On Mon, Dec 1, 2014 at 8:39 AM, Konstantin Khlebnikov koc...@gmail.com wrote:
 On Mon, Dec 1, 2014 at 7:21 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Sun, Nov 30, 2014 at 11:03 PM, Konstantin Khlebnikov
 koc...@gmail.com wrote:
 Hmm. What about per-task/thread UUID? exported via separate file: 
 /proc/PID/uuid
 It could be created at the first access, thus this wouldn't shlowdown 
 clone().
 Also it could be droped at execve(), so it'll describe execution
 context more precisely than pid.


 I'd be all for this if it weren't for two issues:

 1. This thing needs to work as soon as init is started, and we can't
 reliably generate random numbers that early.

 2. Migration / CRIU is rather fundamentally at odds with making
 anything universally unique, as opposed to just per-user-ns.

 So, given that I don't think we can actually provide a universally
 unique pid-like thing, I'd rather not even try.

 Well... another idea: what about pid generation counter? Of course it
 should be per-pid-ns.
 As I see struct upid has nice 32-bit hole next to 'nr' field. (on
 64-bit systems)


I thought about that, but it has two issues:

1. Implementing it will be pain.  The pid allocation algorithm is
already complicated, and it wasn't obvious to me how to accurately
detect wraparound without racing against other wrapping users.

2. I don't think it will be reliable.  Suppose that there are pid_max
- 1 processes.  One of them can repeatedly clone and exit,
incrementing the generation counter each time.  After 2^32 iterations,
which won't take all that long, the pid generation will wrap.

--Andy


 That being said, I want to rework this a little bit.  I think that
 highpid should be restricted to the range 2^32 through 2^64 - 4096.
 The former prevents anyone from confusing highpid with regular pid,
 and the latter means that we don't need to worry about confusion
 between errors and valid highpids (e.g. -1 will never be a highpid).

 Implementing that will be only mildly annoying.

 --Andy

 On Sat, Nov 29, 2014 at 2:05 AM, Andy Lutomirski l...@amacapital.net 
 wrote:
 Pid reuse is common, which means that it's difficult or impossible
 to read information about a pid from /proc without races.

 This introduces a second number associated with each (task, pidns)
 pair called highpid.  Highpid is a 64-bit number, and, barring
 extremely unlikely circumstances or outright error, a (highpid, pid)
 will never be reused.

 With just this change, a program can open /proc/PID/status, read the
 Highpid field, and confirm that it has the expected value.  If the
 pid has been reused, then highpid will be different.

 The initial implementation is straightforward: highpid is simply a
 64-bit counter. If a high-end system can fork every 3 ns (which
 would be amazing, given that just allocating a pid requires at
 atomic operation), it would take well over 1000 years for highpid to
 wrap.

 For CRIU's benefit, the next highpid can be set by a privileged
 user.

 NB: The sysctl stuff only works on 64-bit systems.  If the approach
 looks good, I'll fix that somehow.

 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---

 If this goes in, there's plenty of room to add new interfaces to
 make this more useful.  For example, we could add a fancier tgkill
 that adds and validates hightgid and highpid, and we might want to
 add a syscall to read one's own hightgid and highpid.  These would
 be quite useful for pidfiles.

 David, would this be useful for kdbus?

 CRIU people: will this be unduly difficult to support in CRIU?

 If you all think this is a good idea, I'll fix the sysctl stuff and
 document it.  It might also be worth adding Hightgid to status.

  fs/proc/array.c   |  5 -
  include/linux/pid.h   |  2 ++
  include/linux/pid_namespace.h |  1 +
  kernel/pid.c  | 19 +++
  kernel/pid_namespace.c| 22 ++
  5 files changed, 44 insertions(+), 5 deletions(-)

 diff --git a/fs/proc/array.c b/fs/proc/array.c
 index cd3653e4f35c..f1e0e69d19f9 100644
 --- a/fs/proc/array.c
 +++ b/fs/proc/array.c
 @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, 
 struct pid_namespace *ns,
 int g;
 struct fdtable *fdt = NULL;
 const struct cred *cred;
 +   const struct upid *upid;
 pid_t ppid, tpid;

 rcu_read_lock();
 @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, 
 struct pid_namespace *ns,
 if (tracer)
 tpid = task_pid_nr_ns(tracer, ns);
 }
 +   upid = pid_upid_ns(pid, ns);
 cred = get_task_cred(p);
 seq_printf(m,
 State:\t%s\n
 Tgid:\t%d\n
 Ngid:\t%d\n
 Pid:\t%d\n
 +   Highpid:\t%llu\n
 PPid:\t%d\n
 TracerPid:\t%d\n
 Uid:\t%d\t%d\t%d\t%d\n
 @@ -183,7 +186,7 @@ static inline void task_state

Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-30 Thread Andy Lutomirski
On Nov 30, 2014 9:45 AM, David Herrmann dh.herrm...@gmail.com wrote:

 Hi Andy

 On Sat, Nov 29, 2014 at 12:05 AM, Andy Lutomirski l...@amacapital.net wrote:
  Pid reuse is common, which means that it's difficult or impossible
  to read information about a pid from /proc without races.
 
  This introduces a second number associated with each (task, pidns)
  pair called highpid.  Highpid is a 64-bit number, and, barring
  extremely unlikely circumstances or outright error, a (highpid, pid)
  will never be reused.
 
  With just this change, a program can open /proc/PID/status, read the
  Highpid field, and confirm that it has the expected value.  If the
  pid has been reused, then highpid will be different.
 
  The initial implementation is straightforward: highpid is simply a
  64-bit counter. If a high-end system can fork every 3 ns (which
  would be amazing, given that just allocating a pid requires at
  atomic operation), it would take well over 1000 years for highpid to
  wrap.
 
  For CRIU's benefit, the next highpid can be set by a privileged
  user.
 
  NB: The sysctl stuff only works on 64-bit systems.  If the approach
  looks good, I'll fix that somehow.
 
  Signed-off-by: Andy Lutomirski l...@amacapital.net
  ---
 
  If this goes in, there's plenty of room to add new interfaces to
  make this more useful.  For example, we could add a fancier tgkill
  that adds and validates hightgid and highpid, and we might want to
  add a syscall to read one's own hightgid and highpid.  These would
  be quite useful for pidfiles.
 
  David, would this be useful for kdbus?

 Much appreciated! This would serve well as replacement for
 'starttime'. I'd prefer if pid_t was 64bit, but I guess that ship
 sailed long ago. Though, your patch might in the end just introduce a
 new pid64, which replaces the old pid and lives in parallel.

 Anyway, considering that we actually want the same pid-reuse
 protection for tid, tgid, ppid and so on, we'd have to add a
 'starttime' for all of them. Sounds ugly.. so we might just end up
 dropping 'starttime' and introduce KDBUS_ITEM_PIDS2 whenever a 'fix'
 is merged upstream.

I don't understand we want.  Doesn't kdbus currently only think
about tid and tgid?

In any event, I just looked at the kdbus code
(https://github.com/gregkh/kdbus/blob/master/metadata.c), and I see:

meta-pid = get_pid(task_pid(current));
meta-tgid = get_pid(task_tgid(current));
meta-starttime = current-start_time,

I don't understand how that concept of starttime is particularly
useful.  Isn't that the start time associated with tid?  How can that
be useful when looking up tgid?

Regardless, my patch here associates a highpid with each struct pid,
and both tid and tgid are just struct pids, so you get both.  I agree
that adding Highppid to /proc might be useful, but that seems like a
future extension that has nothing to do with kdbus.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-30 Thread Andy Lutomirski
On Nov 30, 2014 1:47 AM, Florian Weimer f...@deneb.enyo.de wrote:

 * Andy Lutomirski:

  The initial implementation is straightforward: highpid is simply a
  64-bit counter. If a high-end system can fork every 3 ns (which
  would be amazing, given that just allocating a pid requires at
  atomic operation), it would take well over 1000 years for highpid to
  wrap.

 I'm not sure if I'm reading the patch correctly, but is the counter
 namespaced?  If yes, why?

It's namespaced so that CRIU can migrate/restore a whole pid namespace.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-29 Thread Andy Lutomirski
On Nov 28, 2014 9:24 PM, Greg KH g...@kroah.com wrote:

 On Fri, Nov 28, 2014 at 03:05:01PM -0800, Andy Lutomirski wrote:
  Pid reuse is common, which means that it's difficult or impossible
  to read information about a pid from /proc without races.
 
  This introduces a second number associated with each (task, pidns)
  pair called highpid.  Highpid is a 64-bit number, and, barring
  extremely unlikely circumstances or outright error, a (highpid, pid)
  will never be reused.
 
  With just this change, a program can open /proc/PID/status, read the
  Highpid field, and confirm that it has the expected value.  If the
  pid has been reused, then highpid will be different.
 
  The initial implementation is straightforward: highpid is simply a
  64-bit counter. If a high-end system can fork every 3 ns (which
  would be amazing, given that just allocating a pid requires at
  atomic operation), it would take well over 1000 years for highpid to
  wrap.
 
  For CRIU's benefit, the next highpid can be set by a privileged
  user.
 
  NB: The sysctl stuff only works on 64-bit systems.  If the approach
  looks good, I'll fix that somehow.
 
  Signed-off-by: Andy Lutomirski l...@amacapital.net
  ---
 
  If this goes in, there's plenty of room to add new interfaces to
  make this more useful.  For example, we could add a fancier tgkill
  that adds and validates hightgid and highpid, and we might want to
  add a syscall to read one's own hightgid and highpid.  These would
  be quite useful for pidfiles.
 
  David, would this be useful for kdbus?
 
  CRIU people: will this be unduly difficult to support in CRIU?
 
  If you all think this is a good idea, I'll fix the sysctl stuff and
  document it.  It might also be worth adding Hightgid to status.
 
   fs/proc/array.c   |  5 -
   include/linux/pid.h   |  2 ++
   include/linux/pid_namespace.h |  1 +
   kernel/pid.c  | 19 +++
   kernel/pid_namespace.c| 22 ++
   5 files changed, 44 insertions(+), 5 deletions(-)
 
  diff --git a/fs/proc/array.c b/fs/proc/array.c
  index cd3653e4f35c..f1e0e69d19f9 100644
  --- a/fs/proc/array.c
  +++ b/fs/proc/array.c
  @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, 
  struct pid_namespace *ns,
int g;
struct fdtable *fdt = NULL;
const struct cred *cred;
  + const struct upid *upid;
pid_t ppid, tpid;
 
rcu_read_lock();
  @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, 
  struct pid_namespace *ns,
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
  + upid = pid_upid_ns(pid, ns);
cred = get_task_cred(p);
seq_printf(m,
State:\t%s\n
Tgid:\t%d\n
Ngid:\t%d\n
Pid:\t%d\n
  + Highpid:\t%llu\n
PPid:\t%d\n
TracerPid:\t%d\n
Uid:\t%d\t%d\t%d\t%d\n

 Changing existing proc files like this is dangerous and can cause lots
 of breakage in userspace programs if you are not careful.  Usually
 adding fields to the end of the file is best, but sometimes even then
 things can break :(

Point taken.  I had hoped that everything would parse /proc/PID/status
by looking for the lamed headers.  I'll see what the history of new
fields in there is, but I can change this easily enough.

--Andy

 --
 To unsubscribe from this list: send the line unsubscribe linux-api in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-28 Thread Andy Lutomirski
Pid reuse is common, which means that it's difficult or impossible
to read information about a pid from /proc without races.

This introduces a second number associated with each (task, pidns)
pair called highpid.  Highpid is a 64-bit number, and, barring
extremely unlikely circumstances or outright error, a (highpid, pid)
will never be reused.

With just this change, a program can open /proc/PID/status, read the
Highpid field, and confirm that it has the expected value.  If the
pid has been reused, then highpid will be different.

The initial implementation is straightforward: highpid is simply a
64-bit counter. If a high-end system can fork every 3 ns (which
would be amazing, given that just allocating a pid requires at
atomic operation), it would take well over 1000 years for highpid to
wrap.

For CRIU's benefit, the next highpid can be set by a privileged
user.

NB: The sysctl stuff only works on 64-bit systems.  If the approach
looks good, I'll fix that somehow.

Signed-off-by: Andy Lutomirski l...@amacapital.net
---

If this goes in, there's plenty of room to add new interfaces to
make this more useful.  For example, we could add a fancier tgkill
that adds and validates hightgid and highpid, and we might want to
add a syscall to read one's own hightgid and highpid.  These would
be quite useful for pidfiles.

David, would this be useful for kdbus?

CRIU people: will this be unduly difficult to support in CRIU?

If you all think this is a good idea, I'll fix the sysctl stuff and
document it.  It might also be worth adding Hightgid to status.

 fs/proc/array.c   |  5 -
 include/linux/pid.h   |  2 ++
 include/linux/pid_namespace.h |  1 +
 kernel/pid.c  | 19 +++
 kernel/pid_namespace.c| 22 ++
 5 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index cd3653e4f35c..f1e0e69d19f9 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct 
pid_namespace *ns,
int g;
struct fdtable *fdt = NULL;
const struct cred *cred;
+   const struct upid *upid;
pid_t ppid, tpid;
 
rcu_read_lock();
@@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, struct 
pid_namespace *ns,
if (tracer)
tpid = task_pid_nr_ns(tracer, ns);
}
+   upid = pid_upid_ns(pid, ns);
cred = get_task_cred(p);
seq_printf(m,
State:\t%s\n
Tgid:\t%d\n
Ngid:\t%d\n
Pid:\t%d\n
+   Highpid:\t%llu\n
PPid:\t%d\n
TracerPid:\t%d\n
Uid:\t%d\t%d\t%d\t%d\n
@@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct 
pid_namespace *ns,
get_task_state(p),
task_tgid_nr_ns(p, ns),
task_numa_group_id(p),
-   pid_nr_ns(pid, ns),
+   upid ? upid-nr : 0, upid ? upid-highnr : 0,
ppid, tpid,
from_kuid_munged(user_ns, cred-uid),
from_kuid_munged(user_ns, cred-euid),
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 23705a53abba..ece70b64d04c 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -51,6 +51,7 @@ struct upid {
/* Try to keep pid_chain in the same cacheline as nr for find_vpid */
int nr;
struct pid_namespace *ns;
+   u64 highnr;
struct hlist_node pid_chain;
 };
 
@@ -170,6 +171,7 @@ static inline pid_t pid_nr(struct pid *pid)
 }
 
 pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns);
+const struct upid *pid_upid_ns(struct pid *pid, struct pid_namespace *ns);
 pid_t pid_vnr(struct pid *pid);
 
 #define do_each_pid_task(pid, type, task)  \
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 1997ffc295a7..1f9f0455d7ef 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -26,6 +26,7 @@ struct pid_namespace {
struct rcu_head rcu;
int last_pid;
unsigned int nr_hashed;
+   atomic64_t next_highpid;
struct task_struct *child_reaper;
struct kmem_cache *pid_cachep;
unsigned int level;
diff --git a/kernel/pid.c b/kernel/pid.c
index 9b9a26698144..863d11a9bfbf 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -312,6 +312,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 
pid-numbers[i].nr = nr;
pid-numbers[i].ns = tmp;
+   pid-numbers[i].highnr =
+   atomic64_inc_return(tmp-next_highpid) - 1;
tmp = tmp-parent;
}
 
@@ -492,17 +494,26 @@ struct pid *find_get_pid(pid_t nr)
 }
 EXPORT_SYMBOL_GPL(find_get_pid);
 
-pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns)
+const struct upid *pid_upid_ns(struct pid *pid

Re: [systemd-devel] [RFC PATCH] proc, pidns: Add highpid

2014-11-28 Thread Andy Lutomirski
[Adding CRIU people.  Whoops.]

On Fri, Nov 28, 2014 at 3:05 PM, Andy Lutomirski l...@amacapital.net wrote:
 Pid reuse is common, which means that it's difficult or impossible
 to read information about a pid from /proc without races.

 This introduces a second number associated with each (task, pidns)
 pair called highpid.  Highpid is a 64-bit number, and, barring
 extremely unlikely circumstances or outright error, a (highpid, pid)
 will never be reused.

 With just this change, a program can open /proc/PID/status, read the
 Highpid field, and confirm that it has the expected value.  If the
 pid has been reused, then highpid will be different.

 The initial implementation is straightforward: highpid is simply a
 64-bit counter. If a high-end system can fork every 3 ns (which
 would be amazing, given that just allocating a pid requires at
 atomic operation), it would take well over 1000 years for highpid to
 wrap.

 For CRIU's benefit, the next highpid can be set by a privileged
 user.

 NB: The sysctl stuff only works on 64-bit systems.  If the approach
 looks good, I'll fix that somehow.

 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---

 If this goes in, there's plenty of room to add new interfaces to
 make this more useful.  For example, we could add a fancier tgkill
 that adds and validates hightgid and highpid, and we might want to
 add a syscall to read one's own hightgid and highpid.  These would
 be quite useful for pidfiles.

 David, would this be useful for kdbus?

 CRIU people: will this be unduly difficult to support in CRIU?

 If you all think this is a good idea, I'll fix the sysctl stuff and
 document it.  It might also be worth adding Hightgid to status.

  fs/proc/array.c   |  5 -
  include/linux/pid.h   |  2 ++
  include/linux/pid_namespace.h |  1 +
  kernel/pid.c  | 19 +++
  kernel/pid_namespace.c| 22 ++
  5 files changed, 44 insertions(+), 5 deletions(-)

 diff --git a/fs/proc/array.c b/fs/proc/array.c
 index cd3653e4f35c..f1e0e69d19f9 100644
 --- a/fs/proc/array.c
 +++ b/fs/proc/array.c
 @@ -159,6 +159,7 @@ static inline void task_state(struct seq_file *m, struct 
 pid_namespace *ns,
 int g;
 struct fdtable *fdt = NULL;
 const struct cred *cred;
 +   const struct upid *upid;
 pid_t ppid, tpid;

 rcu_read_lock();
 @@ -170,12 +171,14 @@ static inline void task_state(struct seq_file *m, 
 struct pid_namespace *ns,
 if (tracer)
 tpid = task_pid_nr_ns(tracer, ns);
 }
 +   upid = pid_upid_ns(pid, ns);
 cred = get_task_cred(p);
 seq_printf(m,
 State:\t%s\n
 Tgid:\t%d\n
 Ngid:\t%d\n
 Pid:\t%d\n
 +   Highpid:\t%llu\n
 PPid:\t%d\n
 TracerPid:\t%d\n
 Uid:\t%d\t%d\t%d\t%d\n
 @@ -183,7 +186,7 @@ static inline void task_state(struct seq_file *m, struct 
 pid_namespace *ns,
 get_task_state(p),
 task_tgid_nr_ns(p, ns),
 task_numa_group_id(p),
 -   pid_nr_ns(pid, ns),
 +   upid ? upid-nr : 0, upid ? upid-highnr : 0,
 ppid, tpid,
 from_kuid_munged(user_ns, cred-uid),
 from_kuid_munged(user_ns, cred-euid),
 diff --git a/include/linux/pid.h b/include/linux/pid.h
 index 23705a53abba..ece70b64d04c 100644
 --- a/include/linux/pid.h
 +++ b/include/linux/pid.h
 @@ -51,6 +51,7 @@ struct upid {
 /* Try to keep pid_chain in the same cacheline as nr for find_vpid */
 int nr;
 struct pid_namespace *ns;
 +   u64 highnr;
 struct hlist_node pid_chain;
  };

 @@ -170,6 +171,7 @@ static inline pid_t pid_nr(struct pid *pid)
  }

  pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns);
 +const struct upid *pid_upid_ns(struct pid *pid, struct pid_namespace *ns);
  pid_t pid_vnr(struct pid *pid);

  #define do_each_pid_task(pid, type, task)  \
 diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
 index 1997ffc295a7..1f9f0455d7ef 100644
 --- a/include/linux/pid_namespace.h
 +++ b/include/linux/pid_namespace.h
 @@ -26,6 +26,7 @@ struct pid_namespace {
 struct rcu_head rcu;
 int last_pid;
 unsigned int nr_hashed;
 +   atomic64_t next_highpid;
 struct task_struct *child_reaper;
 struct kmem_cache *pid_cachep;
 unsigned int level;
 diff --git a/kernel/pid.c b/kernel/pid.c
 index 9b9a26698144..863d11a9bfbf 100644
 --- a/kernel/pid.c
 +++ b/kernel/pid.c
 @@ -312,6 +312,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)

 pid-numbers[i].nr = nr;
 pid-numbers[i].ns = tmp;
 +   pid-numbers[i].highnr =
 +   atomic64_inc_return(tmp-next_highpid) - 1;
 tmp = tmp-parent

Re: [systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it

2014-11-03 Thread Andy Lutomirski
On Mon, Nov 3, 2014 at 5:32 AM, Tom Gundersen t...@jklm.no wrote:
 Hi Andy,

 On Tue, Oct 28, 2014 at 11:46 PM, Andy Lutomirski l...@amacapital.net wrote:
 So far, hidraw_id detects U2F tokens and sets:
 ID_U2F_TOKEN=1
 ID_SECURITY_TOKEN=1

 This causes the uaccess rules to apply to U2F devices.
 ---

 I've never written any udev code before.  Feedback welcome.

 If you think this doesn't belong in udev, I can try to find it another home.

 As mentioned, I don't think we should do this in core udev code (udev
 rule would be ok), but making it generic and exposing all the usages
 sounds useful. I don't have a strong opinion on whether this should be
 in udev or in the kernel, but if you end up going with udev, here are
 some comments.

Thanks.  I'll wait to see what Jiri thinks before sending an updated version.


 Also, we don't want to add any more external helpers to the udev
 codebase, so we should just make this a builtin (analogous to
 udev-builtin-usb_id.c).

 Minor comments inline.

 [...]

 diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c
 new file mode 100644
 index ..e32f222f22f9
 --- /dev/null
 +++ b/src/udev/hidraw_id/hidraw_id.c
 @@ -0,0 +1,144 @@
 +/*
 + * Copyright (c) Andrew Lutomirski, 2014
 + *
 + * This program is free software: you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by

 New udev code must be LGPL2+.

OK.

[snipped things I'll change if I resubmit]


 +/* Long item; skip it. */
 +if (i + 1 = desclen) {
 +log_error(bad report_descriptor);
 +goto out;
 +}
 +i += (desc[i+1] + 3);  /* Can't overflow. */
 +continue;
 +}
 +
 +size = (sizecode  3 ? sizecode : 4);
 +
 +if (i + 1 + size  desclen) {
 +log_error(bad report_descriptor);
 +goto out;
 +}
 +
 +for (j = 0; j  size; j++)
 +value |= (desc[i + 1 + j]  8*j);
 +
 +if (type == 1  tag == 0)
 +usage_page = value;

 Should we verify that the data has the right length (2 or 4 bytes)? I
 guess we also need to handle 2-byte usage pages specially?

I think we're not supposed to check the length in general, but I
should re-read the part about two-byte usage pages.  (Although, off
the top of my head, I thought that the special case was for very long
usages, not usage pages.)

[...]

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-03 Thread Andy Lutomirski
On Mon, Nov 3, 2014 at 11:03 AM, David Herrmann dh.herrm...@gmail.com wrote:
 Hi

 On Sun, Nov 2, 2014 at 7:57 PM, Andy Lutomirski l...@amacapital.net wrote:
 I want to get U2F (universal second factor, sometimes called security
 key or even gnubby) working on Linux.  U2F tokens are HID devices
 that speak a custom protocol.  The intent is that user code will speak
 to then using something like HIDAPI.

 The trick is that, for HIDAPI to work, something needs to recognize
 these devices and get udev to set appropriate device permissions.

 [snip]

  - An actual kernel driver for U2F devices using the hid group
 mechanism for enumeration.  This seems overcomplicated.

 Imho, this is the way to go. Create a proper char-dev for U2F, create
 an API and make it work.

 We had this discussion earlier about vendor-extensions that should be
 writable via hidraw from user-space. This turned out to be really
 messy.. and was discussed for several weeks straight. hidraw just
 wasn't designed as unprivileged user-space API. For instance, what
 happens if a device provides U2F plus something else? Both will be on
 the same hidraw device.
 We could split hidraw per usage, but I don't see how that is superior
 to a proper U2F API. And once one usage can affect a device as a whole
 (like power-off), you're screwed.

Agreed, mostly.  My only real concern is that this could be annoying
for the userspace developers who will need to target Linux and HIDAPI
separately.  Admittedly the Linux support will be trivial.

I can give the driver a try.  It'll actually be simpler than the spec
makes it out, since a real kernel driver will have no need to
arbitrate with itself.

--Andy


 Just look at the libusb mess where some devices are handled in the
 kernel and some in user-space (eg., see Gnome cheese, media devices,
 ...). I don't think we should repeat that with HID.

 Thanks
 David



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-03 Thread Andy Lutomirski
On Mon, Nov 3, 2014 at 12:21 PM, Jiri Kosina jkos...@suse.cz wrote:
 On Mon, 3 Nov 2014, David Herrmann wrote:

  Agreed, mostly.  My only real concern is that this could be annoying
  for the userspace developers who will need to target Linux and HIDAPI
  separately.  Admittedly the Linux support will be trivial.

 I see. I'll not stop you from using hidraw, I'd just not recommend it.
 Especially for security stuff I dislike exposing the whole HID device
 writable. But yeah, I guess you got my point.

 Alright, I am basically thinking loudly now, but how about we allow HID
 drivers that would override default hidraw interface?

 I am very well aware of the fact that this could be opening a can of
 worms, so we'll have to make it very restrictive. How about, let's say, we
 allow HID drivers to provide custom hidraw interface (completely
 overriding the one that HID core would normally create) only for cases
 such as:

 - the intent is to expose only certain parts of a combined device
 - the intent is to introduce some level of access control

 I.e. still no interference of kernel with data parsing allowed.

Hmm.  Would this be like a filter on hidraw actions?

How would udev distinguish these special hidraw devices from normal
hidraw devices?

Also, for U2F, this could be a little awkward.  There's some crazy
fragmentation stuff to allow a U2F request to be split across HID
requests, and I think a kernel driver would much rather get the
original unfragmented application request.

--Andy


  I can give the driver a try.  It'll actually be simpler than the spec
  makes it out, since a real kernel driver will have no need to
  arbitrate with itself.

 I'll gladly review any custom HID drivers. Feel free to put me on CC
 when sending to linux-input. There's also a lot of examples in
 drivers/hid/ in case you need a head-start (especially if you need the
 raw_event callback).

 Same here, of course.

 Please always CC me in parallel to sending to linux-input@ to make sure
 that the patch doesn't fall in between cracks.

 Thanks,

 --
 Jiri Kosina
 SUSE Labs



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
I want to get U2F (universal second factor, sometimes called security
key or even gnubby) working on Linux.  U2F tokens are HID devices
that speak a custom protocol.  The intent is that user code will speak
to then using something like HIDAPI.

The trick is that, for HIDAPI to work, something needs to recognize
these devices and get udev to set appropriate device permissions.

My question is: how should this be done?  The official way to
enumerate U2F devices is to look for a HID usage page 0xf1d0
containing usage 0x1.

Options include:

 - A builtin udev helper that reads the sysfs report_descriptor for
hid or hidraw devices and sets attributes accordingly (either
ID_SECURITY_TOKEN or something more general).

- A udev helper that does this and doesn't live in the systemd tree.
I don't love this option -- I'd prefer for this to be as plug-and-play
as possible.

- HID core code in the kernel to add
HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code
to do the same).  This might end up producing a rather long string or
some devices.

 - An actual kernel driver for U2F devices using the hid group
mechanism for enumeration.  This seems overcomplicated.

Concretely, my U2F device is:

/devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0011/hidraw/hidraw0

The 0xf1d0 thing is enumerable on
/devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0011.

Thoughts?

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 12:42 PM, Jiri Kosina jkos...@suse.cz wrote:
 On Sun, 2 Nov 2014, Andy Lutomirski wrote:

 I want to get U2F (universal second factor, sometimes called security
 key or even gnubby) working on Linux.  U2F tokens are HID devices
 that speak a custom protocol.  The intent is that user code will speak
 to then using something like HIDAPI.

 The trick is that, for HIDAPI to work, something needs to recognize
 these devices and get udev to set appropriate device permissions.

 Just to make sure we are on the same page -- this is really only about
 setting proper device node permissions and nothing else, right?

Yes.


 My question is: how should this be done?  The official way to
 enumerate U2F devices is to look for a HID usage page 0xf1d0
 containing usage 0x1.

 Options include:

  - A builtin udev helper that reads the sysfs report_descriptor for
 hid or hidraw devices and sets attributes accordingly (either
 ID_SECURITY_TOKEN or something more general).

 Hmmm ... please keep in mind that report_descriptor is actually in
 debugfs, so it's a bit questionable whether you can rely on it being
 present on well-defined location on all systems.


Huh?  I have 
/sys/devices/pci:00/:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:1050:0120.0013/report_descriptor
on my system.

 You can get it from different other places though, such as libusb (but
 then you are limited only to USB HID devices ... which might be enough
 in your particular case).

I have no idea whether there will ever be a bluetooth U2F device.


 - A udev helper that does this and doesn't live in the systemd tree.
 I don't love this option -- I'd prefer for this to be as plug-and-play
 as possible.

 Agreed.

 - HID core code in the kernel to add
 HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code
 to do the same).  This might end up producing a rather long string or
 some devices.

 We have been thinking about this quite a lot in the past, and decided to
 postpone this until there is a very good reason for it to happen.

I'm not sure that U2F counts as a very good reason.


  - An actual kernel driver for U2F devices using the hid group
 mechanism for enumeration.  This seems overcomplicated.

 Hmmm ... why do you think so? I believe it actually should be really
 super-trivial.

The HID group part is trivial, but what interface would the driver
expose?  I don't think that a udev rule for hidraw matching the
modalias makes any sense, and, if it were a real driver, what
interface would it expose?  Most cross-platform userspace code for U2F
is likely to use HIDAPI.

Admittedly, a U2F driver in the kernel would be slightly nicer from a
multiplexing standpoint than hidraw.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 12:47 PM, Tom Gundersen t...@jklm.no wrote:
 Hi Andy,

 On Sun, Nov 2, 2014 at 7:57 PM, Andy Lutomirski l...@amacapital.net wrote:
 I want to get U2F (universal second factor, sometimes called security
 key or even gnubby) working on Linux.  U2F tokens are HID devices
 that speak a custom protocol.  The intent is that user code will speak
 to then using something like HIDAPI.

 The trick is that, for HIDAPI to work, something needs to recognize
 these devices and get udev to set appropriate device permissions.

 My question is: how should this be done?  The official way to
 enumerate U2F devices is to look for a HID usage page 0xf1d0
 containing usage 0x1.

 Options include:

  - A builtin udev helper that reads the sysfs report_descriptor for
 hid or hidraw devices and sets attributes accordingly (either
 ID_SECURITY_TOKEN or something more general).

 I don't think we should have such special-purpose logic in the udev core.

 [...]

 - HID core code in the kernel to add
 HID_USAGES=f1d1:lots:of:other:things to the uevent (or udev code
 to do the same).  This might end up producing a rather long string or
 some devices.

 This makes the most sense to me. We could put this logic (adapting the
 patch you posted) in src/udev/udev-builtin-usb_id.c.

How would that work?  Can't a USB device have more than one HID class
device under it?  I could imagine a future U2F keyboard that has a HID
class device for the keyboard part and another one for U2F.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires
benjamin.tissoi...@gmail.com wrote:
 On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina jkos...@suse.cz wrote:
 On Sun, 2 Nov 2014, Jiri Kosina wrote:

 Alternatively, you can just write udev rule which triggers on HID devices,
 issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides
 afterwards whether node permissions need to be altered ... right?

 Just to make myself clear here -- this is basically alternative to your
 1st option, but for cases where you want to make yourself independent on
 sysfs existence (I don't what is the craziness level of setups you are
 considering).


 I am a little bit concerned with the user space retrieving the
 descriptor, parsing it and then deciding how to set the permissions.
 It's not that it is super complex, but it will duplicate the parsing
 we already do in the kernel. Wacom tried to do that in their usb
 driver, and they never managed to fully implement one.

 I think the HID_GROUP solution is super trivial:
 - add a match on the U2F input report and set the group to
 HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c)
 - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c).

 Then, the udev rule would be super trivial.

I'm confused.  What would the user-visible effect of this be?  Would
the hidraw node still show up?  What is user code supposed to match to
detect a U2F device or to otherwise set permissions?

--Andy


 Also, if we want to further extend the kernel API for U2F, the group
 will already be in place.

 Cheers,
 Benjamin



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 3:01 PM, Benjamin Tissoires
benjamin.tissoi...@gmail.com wrote:
 On Sun, Nov 2, 2014 at 5:49 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires
 benjamin.tissoi...@gmail.com wrote:
 On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina jkos...@suse.cz wrote:
 On Sun, 2 Nov 2014, Jiri Kosina wrote:

 Alternatively, you can just write udev rule which triggers on HID devices,
 issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides
 afterwards whether node permissions need to be altered ... right?

 Just to make myself clear here -- this is basically alternative to your
 1st option, but for cases where you want to make yourself independent on
 sysfs existence (I don't what is the craziness level of setups you are
 considering).


 I am a little bit concerned with the user space retrieving the
 descriptor, parsing it and then deciding how to set the permissions.
 It's not that it is super complex, but it will duplicate the parsing
 we already do in the kernel. Wacom tried to do that in their usb
 driver, and they never managed to fully implement one.

 I think the HID_GROUP solution is super trivial:
 - add a match on the U2F input report and set the group to
 HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c)
 - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c).

 Then, the udev rule would be super trivial.

 I'm confused.  What would the user-visible effect of this be?  Would
 the hidraw node still show up?  What is user code supposed to match to
 detect a U2F device or to otherwise set permissions?


 The only change with what we currently have is that the modalias of
 the device will be something like
 MODALIAS=hid:b0003gHID_GROUP_U2Fvp. (replace
 HID_GROUP_U2F with the proper hex value). So matching against this
 modalias is trivial, and you can just put the hidraw node rw for
 users.

Do we really want udev matching against MODALIAS, and do we really
want udev rules to hardcode hid group constants?

I'd like this idea better if we added a HID_GROUP uevent property with
a textual description, or perhaps something a little more specific.
The hid group field seems to be used for different types of things.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Supporting U2F over HID on Linux?

2014-11-02 Thread Andy Lutomirski
On Sun, Nov 2, 2014 at 4:40 PM, Benjamin Tissoires
benjamin.tissoi...@gmail.com wrote:
 On Sun, Nov 2, 2014 at 6:34 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Sun, Nov 2, 2014 at 3:01 PM, Benjamin Tissoires
 benjamin.tissoi...@gmail.com wrote:
 On Sun, Nov 2, 2014 at 5:49 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Sun, Nov 2, 2014 at 2:45 PM, Benjamin Tissoires
 benjamin.tissoi...@gmail.com wrote:
 On Sun, Nov 2, 2014 at 4:40 PM, Jiri Kosina jkos...@suse.cz wrote:
 On Sun, 2 Nov 2014, Jiri Kosina wrote:

 Alternatively, you can just write udev rule which triggers on HID 
 devices,
 issues HIDIOCGRDESC ioctl() on the just-created hidraw node, and decides
 afterwards whether node permissions need to be altered ... right?

 Just to make myself clear here -- this is basically alternative to your
 1st option, but for cases where you want to make yourself independent on
 sysfs existence (I don't what is the craziness level of setups you are
 considering).


 I am a little bit concerned with the user space retrieving the
 descriptor, parsing it and then deciding how to set the permissions.
 It's not that it is super complex, but it will duplicate the parsing
 we already do in the kernel. Wacom tried to do that in their usb
 driver, and they never managed to fully implement one.

 I think the HID_GROUP solution is super trivial:
 - add a match on the U2F input report and set the group to
 HID_GROUP_U2F (2 lines in hid_scan_input_usage() in hid-core.c)
 - allow hid-generic to also match HID_GROUP_U2F (1 line in hid-generic.c).

 Then, the udev rule would be super trivial.

 I'm confused.  What would the user-visible effect of this be?  Would
 the hidraw node still show up?  What is user code supposed to match to
 detect a U2F device or to otherwise set permissions?


 The only change with what we currently have is that the modalias of
 the device will be something like
 MODALIAS=hid:b0003gHID_GROUP_U2Fvp. (replace
 HID_GROUP_U2F with the proper hex value). So matching against this
 modalias is trivial, and you can just put the hidraw node rw for
 users.

 Do we really want udev matching against MODALIAS, and do we really
 want udev rules to hardcode hid group constants?

 That's a good question. I don't think the hid group will change in the
 future and we can guarantee that in the kernel I think.


 I'd like this idea better if we added a HID_GROUP uevent property with
 a textual description, or perhaps something a little more specific.

 This refers back to Jiri's first remark. Adding such a thing is
 doable, but do we really want/need it :/

I would tend to think that designing a HID interface for userspace
consumption might be a hint that it's the right time to give it a
better name than MODALIAS :)

--Andy


 The hid group field seems to be used for different types of things.

 yes, my proposal is definitively a (ugly) hack around the groups which
 are used to select which hid subdriver we use.


 An other question which comes to my mind is don't we want logind to
 assign the hidraw node to the proper client? We may still need to tag
 the device somehow, but if logind prevents untrusted application to
 run arbitrary code on the u2f token, that might be a little bit safer
 than allowing anybody to read/write.

We do want it to be assigned to the proper client.  My udev patch
(which will almost certainly not be accepted as-is, but it could be
fixed up) uses the report_descriptor to set ID_SECURITY_TOKEN=1, which
in turn causes an existing rule to set TAG+=uaccess, which causes
logind to do its thing.


 Sorry to raise more questions than providing answers.

No problem.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it

2014-10-29 Thread Andy Lutomirski
On Tue, Oct 28, 2014 at 3:46 PM, Andy Lutomirski l...@amacapital.net wrote:
 So far, hidraw_id detects U2F tokens and sets:
 ID_U2F_TOKEN=1
 ID_SECURITY_TOKEN=1

 This causes the uaccess rules to apply to U2F devices.

This works for the Plug-up security key, too.

--Andy

 ---

 I've never written any udev code before.  Feedback welcome.

 If you think this doesn't belong in udev, I can try to find it another home.

  .gitignore |   1 +
  Makefile.am|  11 
  rules/60-hidraw.rules  |   7 ++
  src/udev/hidraw_id/Makefile|   1 +
  src/udev/hidraw_id/hidraw_id.c | 144 
 +
  5 files changed, 164 insertions(+)
  create mode 100644 rules/60-hidraw.rules
  create mode 12 src/udev/hidraw_id/Makefile
  create mode 100644 src/udev/hidraw_id/hidraw_id.c

 diff --git a/.gitignore b/.gitignore
 index f119b574c777..4bd3cdf08f0d 100644
 --- a/.gitignore
 +++ b/.gitignore
 @@ -34,6 +34,7 @@
  /exported
  /exported-*
  /gtk-doc.make
 +/hidraw_id
  /hostnamectl
  /install-tree
  /journalctl
 diff --git a/Makefile.am b/Makefile.am
 index fae946a388af..9f64687d32b1 100644
 --- a/Makefile.am
 +++ b/Makefile.am
 @@ -3542,6 +3542,17 @@ udevlibexec_PROGRAMS += \
 ata_id

  # 
 --
 +hidraw_id_SOURCES = \
 +   src/udev/hidraw_id/hidraw_id.c
 +
 +hidraw_id_LDADD = \
 +   libudev-internal.la \
 +   libsystemd-shared.la
 +
 +udevlibexec_PROGRAMS += \
 +   hidraw_id
 +
 +# 
 --
  cdrom_id_SOURCES = \
 src/udev/cdrom_id/cdrom_id.c

 diff --git a/rules/60-hidraw.rules b/rules/60-hidraw.rules
 new file mode 100644
 index ..1ee9c812f711
 --- /dev/null
 +++ b/rules/60-hidraw.rules
 @@ -0,0 +1,7 @@
 +# do not edit this file, it will be overwritten on update
 +
 +ACTION==remove, GOTO=hidraw_end
 +
 +SUBSYSTEM==hidraw, IMPORT{program}=hidraw_id --udev
 +
 +LABEL=keyboard_end
 diff --git a/src/udev/hidraw_id/Makefile b/src/udev/hidraw_id/Makefile
 new file mode 12
 index ..d0b0e8e0086f
 --- /dev/null
 +++ b/src/udev/hidraw_id/Makefile
 @@ -0,0 +1 @@
 +../Makefile
 \ No newline at end of file
 diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c
 new file mode 100644
 index ..e32f222f22f9
 --- /dev/null
 +++ b/src/udev/hidraw_id/hidraw_id.c
 @@ -0,0 +1,144 @@
 +/*
 + * Copyright (c) Andrew Lutomirski, 2014
 + *
 + * This program is free software: you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation, either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#include stdio.h
 +#include string.h
 +#include sys/types.h
 +#include sys/stat.h
 +#include fcntl.h
 +#include unistd.h
 +
 +#include libudev.h
 +#include libudev-private.h
 +
 +_printf_(6,0)
 +static void log_fn(struct udev *udev, int priority,
 +   const char *file, int line, const char *fn,
 +   const char *format, va_list args)
 +{
 +log_metav(priority, file, line, fn, format, args);
 +}
 +
 +int main(int argc, char **argv)
 +{
 +struct udev *udev;
 +struct udev_device *dev, *hiddev;
 +char path[4096];
 +unsigned char desc[4096];
 +int desclen;
 +int fd = -1;
 +int i;
 +int ret = 1;
 +unsigned int usage_page = 0;
 +int is_u2f_token = 0;
 +
 +if (argc != 2) {
 +fprintf(stderr, Usage: hidraw_id SYSFS_PATH|--udev\n);
 +return 1;
 +}
 +
 +log_parse_environment();
 +log_open();
 +
 +udev = udev_new();
 +
 +udev_set_log_fn(udev, log_fn);
 +
 +if (!strcmp(argv[1], --udev))
 +dev = udev_device_new_from_environment(udev);
 +else
 +dev = udev_device_new_from_syspath(udev, argv[1]);
 +
 +if (!dev)
 +goto out;
 +
 +hiddev = udev_device_get_parent(dev);
 +if (!hiddev)
 +goto out;
 +
 +if (snprintf(path, sizeof(path), %s/report_descriptor,
 + udev_device_get_syspath(hiddev))  (int)sizeof(path))
 +return 1;
 +
 +fd = open(path, O_RDONLY | O_NOFOLLOW);
 +if (fd == -1)
 +goto out;
 +
 +desclen = read(fd, desc, sizeof(desc));
 +if (desclen = 0

Re: [systemd-devel] Writing a udev rule for U2F security tokens?

2014-10-28 Thread Andy Lutomirski
On Tue, Oct 28, 2014 at 1:40 AM, Greg KH gre...@linuxfoundation.org wrote:
 On Mon, Oct 27, 2014 at 04:37:14PM -0700, Andy Lutomirski wrote:
 On Mon, Oct 27, 2014 at 4:32 PM, Greg KH gre...@linuxfoundation.org wrote:
  On Mon, Oct 27, 2014 at 04:12:30PM -0700, Andy Lutomirski wrote:
  Hi-
 
  I'd like to write a generic udev rule for U2F security tokens and to
  possibly get it integrated into systemd / udev, but I'm not sure how
  to write it in the first place.
 
  U2F tokens are USB HID devices that have a usage page 0xF1D0 that
  contains usage 0x01.  The rule should match any hidraw device with
  that property.  Can this be done without a user helper?  Is there an
  existing helper in which it would make sense to add such a check?
 
  Here's the draft USB forum allocation:
 
  http://www.usb.org/developers/hidpage/HUTRR48.pdf
 
  Here's the draft spec from the FIDO Alliance:
 
  https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf
 
  In practice, I expect little change between the draft and final specs,
  since these devices are already for sale and Chromium supports them.
 
  I don't understand, what would a udev rule do with these devices?
  Shouldn't they be exported automatically using the hid raw interface
  so that userspace can talk to them?  What else needs to be done?

 Wow, I clearly failed to transfer my thoughts into email...

 I want to set ID_SECURITY_TOKEN=1 or, more generally, cause the
 uaccess tag to be set so that users have permission to use the token.

 This rule works in Fedora for the existing tokens by Yubico:

 KERNEL==hidraw*, SUBSYSTEM==hidraw, ATTRS{idVendor}==1050,
 ATTRS{idProduct}==0113|0114|0115|0116|0120,
 ENV{ID_SECURITY_TOKEN}=1

 but it won't work for other brands of U2F token.

 If there's no sysfs attribute that you can read directly to determine
 that it is a a U2F token, then it's not easy to write a udev rule.

 You can write a simple program to read the hid pages from the hidraw
 interface, and then set an environment variable from there if the FIDO
 Alliance Page is present.  You can use a udev rule for that, but it
 will have to be an external tool.

Would a tool like that be considered appropriate to distribute with
udev?  It would have somewhat unpleasant overhead for what is
currently a niche use case.

I suppose the kernel could also be modified to expose this, but doing
that cleanly will involve exposing all the usage pages in sysfs, which
is more complexity than I really want to add.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] udev: Add hidraw_id and a rule file to invoke it

2014-10-28 Thread Andy Lutomirski
So far, hidraw_id detects U2F tokens and sets:
ID_U2F_TOKEN=1
ID_SECURITY_TOKEN=1

This causes the uaccess rules to apply to U2F devices.
---

I've never written any udev code before.  Feedback welcome.

If you think this doesn't belong in udev, I can try to find it another home.

 .gitignore |   1 +
 Makefile.am|  11 
 rules/60-hidraw.rules  |   7 ++
 src/udev/hidraw_id/Makefile|   1 +
 src/udev/hidraw_id/hidraw_id.c | 144 +
 5 files changed, 164 insertions(+)
 create mode 100644 rules/60-hidraw.rules
 create mode 12 src/udev/hidraw_id/Makefile
 create mode 100644 src/udev/hidraw_id/hidraw_id.c

diff --git a/.gitignore b/.gitignore
index f119b574c777..4bd3cdf08f0d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /exported
 /exported-*
 /gtk-doc.make
+/hidraw_id
 /hostnamectl
 /install-tree
 /journalctl
diff --git a/Makefile.am b/Makefile.am
index fae946a388af..9f64687d32b1 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -3542,6 +3542,17 @@ udevlibexec_PROGRAMS += \
ata_id
 
 # 
--
+hidraw_id_SOURCES = \
+   src/udev/hidraw_id/hidraw_id.c
+
+hidraw_id_LDADD = \
+   libudev-internal.la \
+   libsystemd-shared.la
+
+udevlibexec_PROGRAMS += \
+   hidraw_id
+
+# 
--
 cdrom_id_SOURCES = \
src/udev/cdrom_id/cdrom_id.c
 
diff --git a/rules/60-hidraw.rules b/rules/60-hidraw.rules
new file mode 100644
index ..1ee9c812f711
--- /dev/null
+++ b/rules/60-hidraw.rules
@@ -0,0 +1,7 @@
+# do not edit this file, it will be overwritten on update
+
+ACTION==remove, GOTO=hidraw_end
+
+SUBSYSTEM==hidraw, IMPORT{program}=hidraw_id --udev
+
+LABEL=keyboard_end
diff --git a/src/udev/hidraw_id/Makefile b/src/udev/hidraw_id/Makefile
new file mode 12
index ..d0b0e8e0086f
--- /dev/null
+++ b/src/udev/hidraw_id/Makefile
@@ -0,0 +1 @@
+../Makefile
\ No newline at end of file
diff --git a/src/udev/hidraw_id/hidraw_id.c b/src/udev/hidraw_id/hidraw_id.c
new file mode 100644
index ..e32f222f22f9
--- /dev/null
+++ b/src/udev/hidraw_id/hidraw_id.c
@@ -0,0 +1,144 @@
+/*
+ * Copyright (c) Andrew Lutomirski, 2014
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/.
+ */
+
+#include stdio.h
+#include string.h
+#include sys/types.h
+#include sys/stat.h
+#include fcntl.h
+#include unistd.h
+
+#include libudev.h
+#include libudev-private.h
+
+_printf_(6,0)
+static void log_fn(struct udev *udev, int priority,
+   const char *file, int line, const char *fn,
+   const char *format, va_list args)
+{
+log_metav(priority, file, line, fn, format, args);
+}
+
+int main(int argc, char **argv)
+{
+struct udev *udev;
+struct udev_device *dev, *hiddev;
+char path[4096];
+unsigned char desc[4096];
+int desclen;
+int fd = -1;
+int i;
+int ret = 1;
+unsigned int usage_page = 0;
+int is_u2f_token = 0;
+
+if (argc != 2) {
+fprintf(stderr, Usage: hidraw_id SYSFS_PATH|--udev\n);
+return 1;
+}
+
+log_parse_environment();
+log_open();
+
+udev = udev_new();
+
+udev_set_log_fn(udev, log_fn);
+
+if (!strcmp(argv[1], --udev))
+dev = udev_device_new_from_environment(udev);
+else
+dev = udev_device_new_from_syspath(udev, argv[1]);
+
+if (!dev)
+goto out;
+
+hiddev = udev_device_get_parent(dev);
+if (!hiddev)
+goto out;
+
+if (snprintf(path, sizeof(path), %s/report_descriptor,
+ udev_device_get_syspath(hiddev))  (int)sizeof(path))
+return 1;
+
+fd = open(path, O_RDONLY | O_NOFOLLOW);
+if (fd == -1)
+goto out;
+
+desclen = read(fd, desc, sizeof(desc));
+if (desclen = 0)
+goto out;
+
+/* Parse the report descriptor. */
+for (i = 0; i  desclen; ) {
+unsigned char tag = desc[i]  4;
+unsigned char type = (desc[i]  2)  0x3;
+unsigned char sizecode = desc[i]  0x3;
+int size, j;

[systemd-devel] Writing a udev rule for U2F security tokens?

2014-10-27 Thread Andy Lutomirski
Hi-

I'd like to write a generic udev rule for U2F security tokens and to
possibly get it integrated into systemd / udev, but I'm not sure how
to write it in the first place.

U2F tokens are USB HID devices that have a usage page 0xF1D0 that
contains usage 0x01.  The rule should match any hidraw device with
that property.  Can this be done without a user helper?  Is there an
existing helper in which it would make sense to add such a check?

Here's the draft USB forum allocation:

http://www.usb.org/developers/hidpage/HUTRR48.pdf

Here's the draft spec from the FIDO Alliance:

https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf

In practice, I expect little change between the draft and final specs,
since these devices are already for sale and Chromium supports them.

Thanks,
Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Writing a udev rule for U2F security tokens?

2014-10-27 Thread Andy Lutomirski
On Mon, Oct 27, 2014 at 4:32 PM, Greg KH gre...@linuxfoundation.org wrote:
 On Mon, Oct 27, 2014 at 04:12:30PM -0700, Andy Lutomirski wrote:
 Hi-

 I'd like to write a generic udev rule for U2F security tokens and to
 possibly get it integrated into systemd / udev, but I'm not sure how
 to write it in the first place.

 U2F tokens are USB HID devices that have a usage page 0xF1D0 that
 contains usage 0x01.  The rule should match any hidraw device with
 that property.  Can this be done without a user helper?  Is there an
 existing helper in which it would make sense to add such a check?

 Here's the draft USB forum allocation:

 http://www.usb.org/developers/hidpage/HUTRR48.pdf

 Here's the draft spec from the FIDO Alliance:

 https://fidoalliance.org/specs/fido-u2f-HID-protocol-v1.0-rd-20141008.pdf

 In practice, I expect little change between the draft and final specs,
 since these devices are already for sale and Chromium supports them.

 I don't understand, what would a udev rule do with these devices?
 Shouldn't they be exported automatically using the hid raw interface
 so that userspace can talk to them?  What else needs to be done?

Wow, I clearly failed to transfer my thoughts into email...

I want to set ID_SECURITY_TOKEN=1 or, more generally, cause the
uaccess tag to be set so that users have permission to use the token.

This rule works in Fedora for the existing tokens by Yubico:

KERNEL==hidraw*, SUBSYSTEM==hidraw, ATTRS{idVendor}==1050,
ATTRS{idProduct}==0113|0114|0115|0116|0120,
ENV{ID_SECURITY_TOKEN}=1

but it won't work for other brands of U2F token.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] udev: fail firmware loading immediately if no search path is defined

2013-08-07 Thread Andy Lutomirski
On Wed, Aug 7, 2013 at 12:52 AM, Maarten Lankhorst
m.b.lankho...@gmail.com wrote:
 Op 07-08-13 02:26, Andy Lutomirski schreef:
 On Tue, Aug 6, 2013 at 5:24 PM, Tom Gundersen t...@jklm.no wrote:
 On 6 Aug 2013 18:32, Bryan Kadzban br...@kadzban.is-a-geek.net wrote:
 On Tue, Aug 06, 2013 at 11:17:17AM +0200, Tom Gundersen wrote:
 On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen t...@jklm.no wrote:
 On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst
 m.b.lankho...@gmail.com wrote:
 Op 05-08-13 18:29, Andy Lutomirski schreef:
 The systemd commit below can delay firmware loading by multiple
 minutes if CONFIG_FW_LOADER_USER_HELPER=y.  Unfortunately no one
 noticed that the systemd-udev change would break new kernels as well
 as old kernels.

 Since the kernel apparently can't count on reasonable userspace
 support, turn this thing off by default.

 commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a
 Author: Tom Gundersen t...@jklm.no
 Date:   Mon Mar 18 15:12:18 2013 +0100

 udev: make firmware loading optional and disable by default

 Distros that whish to support old kernels should set

 --with-firmware-dirs=/usr/lib/firmware/updates:/usr/lib/firmware
 to retain the old behaviour.

 methinks this patch should be reverted then,
 Well, all the code is still there, so it can be enabled if anyone
 wants it.

 or a stub should be added to udev to always fail firmware loading so
 timeouts don't occur.
 I think the only use (if any) of a userspace firmware loader would be
 for anyone who wants a custom one (i.e., not udev), so we shouldn't
 just fail the loading from udev unconditionally.

 How about we just improve the udev documentation a bit, similar to
 Andy's kernel patch?
 Sorry, I should first have checked. We already document this in the
 README:

Userspace firmware loading is deprecated, will go away, and
sometimes causes problems:
  CONFIG_FW_LOADER_USER_HELPER=n
 ...And this patch is making the kernel default to the correct behavior,
 instead of the now-broken-by-udev behavior.

 I'm not sure I see the issue with it?  :-)
 Oh yeah this patch is totally the right thing to do, I was just arguing that
 there is nothing to be done on the udev side.

 (Add me to the list of people that think udev is broken too, fwiw.  But
 let's at least not leave *both* sides in a broken-by-default state.)
 Well I don't think it is too much to ask that the kernel and udev should be
 configured in a consistent way. Especially as thing still work even if you
 get it wrong, albeit with a delay.
 Except that the current defaults are inconsistent and there is no
 explanation anywhere in the logs when this is screwed up.

 So what is wrong with my 'fail in udev immediately if not configured' idea? 
 In that case it
 doesn't matter whether CONFIG_FW_LOADER_USER_HELPER is set or not.

 You could even print a useful message for the user in udev to the log, so 
 they have an idea of what
 happened. Breaking udev on older still supported kernels by default without 
 printing any debug info
 is silly, and the only cost is a small increase in disk space when unused. I 
 did so in below patch.

Seems reasonable to me.  It might be worth adding something to the
message to suggest turning off CONFIG_FW_LOADER_USER_HELPER.


 ~Maarten

 I converted  ELEMENTSOF to != ELEMENTSOF in the loop to work correctly when 
 the array is empty.
 Most code in udev-builtin-firmware is eliminated at -O2 optimization level 
 when FIRMWARE_PATH
 is not explicitly set.

 8
 diff --git a/Makefile.am b/Makefile.am
 index b8b8d06..2097629 100644
 --- a/Makefile.am
 +++ b/Makefile.am
 @@ -2235,6 +2235,7 @@ libudev_core_la_SOURCES = \
 src/udev/udev-ctrl.c \
 src/udev/udev-builtin.c \
 src/udev/udev-builtin-btrfs.c \
 +   src/udev/udev-builtin-firmware.c \
 src/udev/udev-builtin-hwdb.c \
 src/udev/udev-builtin-input_id.c \
 src/udev/udev-builtin-keyboard.c \
 @@ -2271,14 +2272,6 @@ libudev_core_la_CPPFLAGS = \
 $(AM_CPPFLAGS) \
 -DFIRMWARE_PATH=$(FIRMWARE_PATH)

 -if ENABLE_FIRMWARE
 -libudev_core_la_SOURCES += \
 -   src/udev/udev-builtin-firmware.c
 -
 -dist_udevrules_DATA += \
 -   rules/50-firmware.rules
 -endif
 -
  if HAVE_KMOD
  libudev_core_la_SOURCES += \
 src/udev/udev-builtin-kmod.c
 diff --git a/configure.ac b/configure.ac
 index 0ecc716..dc7a3e3 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -823,8 +823,6 @@ for i in $with_firmware_path; do
  done
  IFS=$OLD_IFS
  AC_SUBST(FIRMWARE_PATH)
 -AS_IF([test x${FIRMWARE_PATH} != x], [ AC_DEFINE(HAVE_FIRMWARE, 1, 
 [Define if FIRMWARE is available]) ])
 -AM_CONDITIONAL(ENABLE_FIRMWARE, [test x${FIRMWARE_PATH} != x])

  # 
 --
  AC_ARG_ENABLE([gudev],
 diff --git a/rules/50-firmware.rules b/rules/50-firmware.rules
 deleted file mode 100644
 index f0ae684..000
 --- a/rules/50-firmware.rules
 +++ /dev/null

Re: [systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it

2013-08-06 Thread Andy Lutomirski
On Tue, Aug 6, 2013 at 2:17 AM, Tom Gundersen t...@jklm.no wrote:
 On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen t...@jklm.no wrote:
 On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst
 m.b.lankho...@gmail.com wrote:
 Op 05-08-13 18:29, Andy Lutomirski schreef:
 The systemd commit below can delay firmware loading by multiple
 minutes if CONFIG_FW_LOADER_USER_HELPER=y.  Unfortunately no one
 noticed that the systemd-udev change would break new kernels as well
 as old kernels.

 Since the kernel apparently can't count on reasonable userspace
 support, turn this thing off by default.

 commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a
 Author: Tom Gundersen t...@jklm.no
 Date:   Mon Mar 18 15:12:18 2013 +0100

 udev: make firmware loading optional and disable by default

 Distros that whish to support old kernels should set
   --with-firmware-dirs=/usr/lib/firmware/updates:/usr/lib/firmware
 to retain the old behaviour.

 methinks this patch should be reverted then,

 Well, all the code is still there, so it can be enabled if anyone wants it.

 or a stub should be added to udev to always fail firmware loading so 
 timeouts don't occur.

 I think the only use (if any) of a userspace firmware loader would be
 for anyone who wants a custom one (i.e., not udev), so we shouldn't
 just fail the loading from udev unconditionally.

 How about we just improve the udev documentation a bit, similar to
 Andy's kernel patch?

 Sorry, I should first have checked. We already document this in the README:

Userspace firmware loading is deprecated, will go away, and
sometimes causes problems:
  CONFIG_FW_LOADER_USER_HELPER=n

TBH, the udev README is the last thing I'm going to check to figure
out why I don't have wifi for a couple minutes after boot.  Also, the
message is missing the point.  It's not that it's deprecated and
sometimes causes problems -- it's that udev *changed behavior* and
breaks your system if you have CONFIG_FW_LOADER_USER_HELPER=y.

If udev logged something (for a couple of years), that would be a
different story.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it

2013-08-06 Thread Andy Lutomirski
On Tue, Aug 6, 2013 at 5:24 PM, Tom Gundersen t...@jklm.no wrote:

 On 6 Aug 2013 18:32, Bryan Kadzban br...@kadzban.is-a-geek.net wrote:

 On Tue, Aug 06, 2013 at 11:17:17AM +0200, Tom Gundersen wrote:
  On Tue, Aug 6, 2013 at 11:11 AM, Tom Gundersen t...@jklm.no wrote:
   On Tue, Aug 6, 2013 at 10:20 AM, Maarten Lankhorst
   m.b.lankho...@gmail.com wrote:
   Op 05-08-13 18:29, Andy Lutomirski schreef:
   The systemd commit below can delay firmware loading by multiple
   minutes if CONFIG_FW_LOADER_USER_HELPER=y.  Unfortunately no one
   noticed that the systemd-udev change would break new kernels as well
   as old kernels.
  
   Since the kernel apparently can't count on reasonable userspace
   support, turn this thing off by default.
  
   commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a
   Author: Tom Gundersen t...@jklm.no
   Date:   Mon Mar 18 15:12:18 2013 +0100
  
   udev: make firmware loading optional and disable by default
  
   Distros that whish to support old kernels should set
  
   --with-firmware-dirs=/usr/lib/firmware/updates:/usr/lib/firmware
   to retain the old behaviour.
  
   methinks this patch should be reverted then,
  
   Well, all the code is still there, so it can be enabled if anyone
   wants it.
  
   or a stub should be added to udev to always fail firmware loading so
   timeouts don't occur.
  
   I think the only use (if any) of a userspace firmware loader would be
   for anyone who wants a custom one (i.e., not udev), so we shouldn't
   just fail the loading from udev unconditionally.
  
   How about we just improve the udev documentation a bit, similar to
   Andy's kernel patch?
 
  Sorry, I should first have checked. We already document this in the
  README:
 
  Userspace firmware loading is deprecated, will go away, and
  sometimes causes problems:
CONFIG_FW_LOADER_USER_HELPER=n

 ...And this patch is making the kernel default to the correct behavior,
 instead of the now-broken-by-udev behavior.

 I'm not sure I see the issue with it?  :-)

 Oh yeah this patch is totally the right thing to do, I was just arguing that
 there is nothing to be done on the udev side.

 (Add me to the list of people that think udev is broken too, fwiw.  But
 let's at least not leave *both* sides in a broken-by-default state.)

 Well I don't think it is too much to ask that the kernel and udev should be
 configured in a consistent way. Especially as thing still work even if you
 get it wrong, albeit with a delay.

Except that the current defaults are inconsistent and there is no
explanation anywhere in the logs when this is screwed up.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)

2013-08-05 Thread Andy Lutomirski
On Mon, Aug 5, 2013 at 4:18 AM, Kay Sievers k...@vrfy.org wrote:
 On Fri, Aug 2, 2013 at 6:28 PM, Zbigniew Jędrzejewski-Szmek
 zbys...@in.waw.pl wrote:
 On Fri, Aug 02, 2013 at 09:04:44AM -0700, Andy Lutomirski wrote:
 CONFIG_FW_LOADER_USER_HELPER=y
 Do you need this? Unsetting this should help.

 This option enables / disables the invocation of user-helper
 (e.g. udev) for loading firmware files as a fallback after the
 direct file loading in kernel fails. The user-mode helper is
 no longer required unless you have a special firmware file that
 resides in a non-standard path.

 On recent systems, if the kernel configures
 CONFIG_FW_LOADER_USER_HELPER=y and a firmware is not found by the
 kernel, the kernel will issue a request which is ignored by userspace
 and will block in that for 60 seconds.

 Udev is no longer in the game of firmware loading, not even as a
 fallback, it will just completely ignore all kernel firmware class
 events.

 The source code in udev to handle firmware requests is disabled by
 default, currently still kept around for old kernels without the
 in-kernel firmware loader, but it will be deleted in the near future.

Any chance you'd consider a less regression-inducing path to getting
rid of this feature?  For example, have udev warn and immediate fail
firmware loading requests for a few releases, then just warn, then
drop support?

Meanwhile, CONFIG_FW_LOADER_USER_HELPER is still default y (!), so
udev has introduced massive bootup delays into the default
configuration with no warning.  It might be nice to change it to
default n, get rid of everything that selects it, and possible even
rename it to something with LEGACY or OBSOLETE in the name so that
make oldconfig will prompt.

--Andy


 Any issues with firmware timeouts should be addressed in the kernel by
 disabling CONFIG_FW_LOADER_USER_HELPER or by removing the fallback
 code from the in-kernel loader.

 Kay



-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH] Change CONFIG_FW_LOADER_USER_HELPER to default n and don't select it

2013-08-05 Thread Andy Lutomirski
The systemd commit below can delay firmware loading by multiple
minutes if CONFIG_FW_LOADER_USER_HELPER=y.  Unfortunately no one
noticed that the systemd-udev change would break new kernels as well
as old kernels.

Since the kernel apparently can't count on reasonable userspace
support, turn this thing off by default.

commit a3bd8447be4ea2ce230eb8ae0e815c04d85fa15a
Author: Tom Gundersen t...@jklm.no
Date:   Mon Mar 18 15:12:18 2013 +0100

udev: make firmware loading optional and disable by default

Distros that whish to support old kernels should set
  --with-firmware-dirs=/usr/lib/firmware/updates:/usr/lib/firmware
to retain the old behaviour.
---
 drivers/base/Kconfig | 15 +++
 drivers/firmware/Kconfig |  1 -
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 5daa259..de3903e 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -146,13 +146,20 @@ config EXTRA_FIRMWARE_DIR
 config FW_LOADER_USER_HELPER
bool Fallback user-helper invocation for firmware loading
depends on FW_LOADER
-   default y
+   default n
help
  This option enables / disables the invocation of user-helper
  (e.g. udev) for loading firmware files as a fallback after the
- direct file loading in kernel fails.  The user-mode helper is
- no longer required unless you have a special firmware file that
- resides in a non-standard path.
+ direct file loading in kernel fails.
+
+ Since March 2013, a default udev build does not understand
+ firmware loading requests.  These udev versions will not
+ even indicate failure; instead they cause long timeouts.
+ This can dramatically slow down the boot process.
+
+ Say Y only if you have special firmware-loading requirements
+ and if you have a non-standard helper that will handle these
+ requests.
 
 config DEBUG_DRIVER
bool Driver Core verbose debug messages
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 07478728..9387630 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -64,7 +64,6 @@ config DELL_RBU
tristate BIOS update support for DELL systems via sysfs
depends on X86
select FW_LOADER
-   select FW_LOADER_USER_HELPER
help
 Say m if you want to have the option of updating the BIOS for your
 DELL system. Note you need a Dell OpenManage or Dell Update package 
(DUP)
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)

2013-08-02 Thread Andy Lutomirski
[cc: linux-kernel, linux-hotplug, and systemd-devel.  This is 3.11-rc3+]

On Fri, Aug 2, 2013 at 12:38 AM, Johannes Berg
johan...@sipsolutions.net wrote:
 On Thu, 2013-08-01 at 21:38 -0700, Andy Lutomirski wrote:
 At boot, I get:
 [   12.537108] iwlwifi :03:00.0: irq 51 for MSI/MSI-X
 ...
 [  132.676781] iwlwifi :03:00.0: loaded firmware version 9.221.4.1
 build 25532 op_mode iwldvm

 This sounds familiar, but wasn't it fixed awhile ago?

 It wasn't exactly fixed and it's really more of a userspace problem - we
 probably request firmware version 8, and then it takes 30 seconds to
 time out for each of 8,7,6,5, after which the next request for 4 is
 successful.

Why's it requesting those firmwares?  They don't seem to exist on
intellinuxwireless.org.

I have:

CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=
CONFIG_FW_LOADER_USER_HELPER=y



 I don't know why your userspace isn't behaving differently though.

 johannes




-- 
Andy Lutomirski
AMA Capital Management, LLC
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Slow firmware timeouts again (Re: [3.11 regression?] iwlwifi firmware takes two minutes to load)

2013-08-02 Thread Andy Lutomirski
On Fri, Aug 2, 2013 at 9:21 AM, Johannes Berg johan...@sipsolutions.net wrote:
 On Fri, 2013-08-02 at 09:04 -0700, Andy Lutomirski wrote:

  It wasn't exactly fixed and it's really more of a userspace problem - we
  probably request firmware version 8, and then it takes 30 seconds to
  time out for each of 8,7,6,5, after which the next request for 4 is
  successful.

 Why's it requesting those firmwares?  They don't seem to exist on
 intellinuxwireless.org.

 Well for one you've never even mentioned what device you have, and then
 also it's not requesting 8/7 only 6,5,4 -- I guess the timeout was
 increased to 60 seconds (or I'm remembering wrong and it always was? I
 thought it was 30s)

I have an Ultimate-N 6300 (rev 35).  It's requesting at least
versions 6 and 5 (I saw them in udevadm monitor).  It looks like the
g2a and g2b variants have -5 and -6 versions, but 6000-4 appears to be
the only relevant version for my hardware.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-25 Thread Andy Lutomirski
On Jun 25, 2013 2:43 AM, Lennart Poettering lenn...@poettering.net
wrote:

 On Mon, 24.06.13 17:09, Andy Lutomirski (l...@amacapital.net) wrote:

 
  On Mon, Jun 24, 2013 at 4:57 PM, Lennart Poettering
  lenn...@poettering.net wrote:
   On Mon, 24.06.13 16:01, Andy Lutomirski (l...@amacapital.net) wrote:
  
   AFAICT the main reason that systemd uses cgroup is to efficiently
   track which service various processes came from and to send signals,
   and it seems like that use case could be handled without cgroups at
   all by creative use of subreapers and a syscall to broadcast a signal
   to everything that has a given subreaper as an ancestor.  In that
   case, systemd could be asked to stay away from cgroups even in the
   single-hierarchy case.
  
   systemd uses cgroups to manage services. Managing services means many
   things. Among them: keeping track of processes, listing processes of a
   service, killing processes of a service, doing per-service logging
   (which means reliably, immediately, and race-freely tracing back
   messages to the service which logged them), about 55 other things, and
   also resource management.
  
   I don't see how I can do anything of this without something like
   cgroups, i.e. hierarchial, resource management involved systemd which
   allows me to securely put labels on processes.
 
  Boneheaded straw-man proposal: two new syscalls and a few spare
processes.
 
  int sys_task_reaper(int tid): Returns the reaper for the task tid
  (which is 1 if there's no subreaper).  (This could just as easily be a
  file in /proc.)
 
  int sys_killall_under_subreaper(int subreaper, int sig): Broadcasts
  sig to all tasks under subreaper (excluding subreaper).  Guarantees
  that, even if those tasks are forking, they all get the signal.
 
  Then, when starting a service, systemd forks, sets the child to be a
  subreaper, then forks that child again to exec the service.
 
  Does this do everything that's needed?

 No. It doesn't do anything that's needed. How do I list all PIDs in a
 service with this?

Walk /proc/subreaper/children recursively.  A kernel patch to make that
field show up unconditionally instead of hiding under EXPERT would help.

 How do I determine the service of a PID?

Call sys_task_reaper, then look up what service that subreaper comes from.

 How do i do
 resource manage with this?

With cgroups, unless the admin has configured systemd not to use cgroups,
in which case you don't.  (The whole point would be to keep
DefaultControllers= without using the one and only cgroup hierarchy.)

--Andy


  sys_task_reaper is trivial to
  implement (that functionality is already there in the reparenting
  code), and sys_killall_under_subreaper is probably not so bad.
 
  This has one main downside I can think of: it wastes a decent number
  of processes (one subreaper per service).

 Yeah, also the downside that it doesn't do what we need.

 Lennart

 --
 Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski

On 06/21/2013 10:36 AM, Lennart Poettering wrote:


2) This hierarchy becomes private property of systemd. systemd will set
it up. Systemd will maintain it. Systemd will rearrange it. Other
software that wants to make use of cgroups can do so only through
systemd's APIs. This single-writer logic is absolutely necessary, since
interdependencies between the various controllers, the various
attributes, the various cgroups are non-obvious and we simply cannot
allow that cgroup users alter the tree independently of each other
forever. Due to all this: The Pax Cgroup document is a thing of the
past, it is dead.




If you are using non-trivial cgroup setups with systemd right now, then
things will change for you. We will provide you with similar
functionality as before, but things will be different and less
low-level. As long as you only used the high-level options such as
CPUShares, MemoryLimit and so on you should be on the safe side.



Hmm.  This may be tricky for my use case.  Here are a few issues.  For 
all I know, they may already be supported (or planned), but I don't want 
to get caught.


1. I put all the entire world into a separate, highly constrained 
cgroup.  My real-time code runs outside that cgroup.  This seems to 
exactly what slices are for, but I need kernel threads to go in to the 
constrained cgroup.  Will systemd support this?


2. I manage services and tasks outside systemd (for one thing, I 
currently use Ubuntu, but even if I were on Fedora, I have a bunch of 
fine-grained things that figure out how they're supposed to allocate 
resources, and porting them to systemd just to keep working in the new 
world order would be a PITA [1]).


(cgroups have the odd feature that they are per-task, not per thread 
group, and the systemd proposal seems likely to break anything that 
actually wants task granularity.  I may actually want to use this, even 
though it's a bit evil -- my real-time thread groups have non-real-time 
threads.)


I think that what I want are something like sub-unit cgroups -- I want 
to be able to ask systemd to further subdivide the group for my unit, 
login session, or whatever.  Would this be reasonable?  (Another way of 
thinking of this is that a unit would have a whole cgroup hierarchy 
instead of just one cgroup.)


I think that the single-hierarchy model will require that I subdivide my 
user session so that the default sub-unit cgroup is constrained 
similarly to the default slice.  I'll lose functionality, but I don't 
think this is a showstopper.


A different approach would be to allow units to (with systemd's 
cooperation) escape into their own, dynamically created unit.  This 
seems kind of awful.


3. My code runs unprivileged, but it still wants to configure itself. 
If needed, I can write a little privileged daemon to handle the systemd 
calls.


I think I can get away without anything fancy if a unit (login session?) 
grant the right to manipulate sub-unit cgroups to a non-root user.


4. As mentioned, I'm on Ubuntu some of the time.  I'd like to keep the 
same code working on systemd and non-systemd systems.


How hard would it be to run systemd as just a cgroup controller?  That 
is, have systemd create its slices, run exactly one unit that represents 
the whole system, and let other things use the cgroup API.



[1] Some day, I might convert my code to use a session systemd instance. 
 I'm not holding my breath, but it could be nice.


--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 6:27 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Sat, 22.06.13 15:19, Andy Lutomirski (l...@amacapital.net) wrote:


 2. I manage services and tasks outside systemd (for one thing, I
 currently use Ubuntu, but even if I were on Fedora, I have a bunch
 of fine-grained things that figure out how they're supposed to
 allocate resources, and porting them to systemd just to keep working
 in the new world order would be a PITA [1]).


[...]


 I think that what I want are something like sub-unit cgroups -- I
 want to be able to ask systemd to further subdivide the group for my
 unit, login session, or whatever.  Would this be reasonable?
 (Another way of thinking of this is that a unit would have a whole
 cgroup hierarchy instead of just one cgroup.)

 The idea is not even to allow this. Basically, if you want to partitions
 your daemon into different cgroups you need to do that through systemd's
 abstractions: slices and services. To make this more palatable we'll
 introduce throw-away units though, so that you can dynamically run
 something as a workload and don't need to be concerned about naming
 this, or cleaning it up.


Hmm.  My particular software can maybe live with this with unpleasant
modifications, but this will break anything that, say, accepts a
connection from a client, forks into a (possibly new) cgroup based on
the identity of that client, and then does something.

How can this support containers or the use of cgroups in a
non-systemwide systemd instance?  Containers may no longer be allowed
to escape from the cgroup they start in, but there should (IMO) still
be a way for things to subdivide their cgroup-controlled resources.

If I want to have a hierarchy more than two levels deep, I suspect I'm
SOL under this model.  If I'm understanding correctly, there will be
slices, then units, and that's it.


 4. As mentioned, I'm on Ubuntu some of the time.  I'd like to keep
 the same code working on systemd and non-systemd systems.

 How hard would it be to run systemd as just a cgroup controller?
 That is, have systemd create its slices, run exactly one unit that
 represents the whole system, and let other things use the cgroup
 API.

 I have no idea, I don't develop Ubuntu. They will have to come up with
 some cgroup maintenance daemon of their own. As I know them they'll
 either do a port of the systemd counter part (but that's going to be
 tough!), or they'll stick something half-baked into Upstart...

 Sorry, if this all sounds a bit disappointing. But yeah, this all is not
 a trivial change...


I'm worried that the impedance mismatch between systemd and any other
possible API is going to be enormous.  On systemd, I'll have to:

 - Create a throwaway unit
 - Figure out how to wire up stdout and stderr correctly (I use them
for communication between processes)
 - Translate the current directory, the environment, etc. into systemd
configuration
 - Translate my desired resource controls into systemd's let's
pretend that there aren't really cgroups underlying it configuration
 - Start the throwaway unit
 - Figure out how to get notified when it finishes

Without systemd, I'll have to:

 - fork()
 - Ask whatever is managing cgroups to switch me to a different cgroup
 - exec()

This is going to suck, I think.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 12:10 PM, Tejun Heo t...@kernel.org wrote:
 Hello, Andy.

 On Mon, Jun 24, 2013 at 11:49:05AM -0700, Andy Lutomirski wrote:
  I have an idea where it should be headed in the long term but am not
  sure about short-term solution.  Given that the only sort wide-spread
  use case is virt kthreads, maybe it just needs to be special cased for
  now.  Not sure.

 I'll be okay (I think) if I can reliably set affinities of these
 threads.  I'm currently doing it with cgroups.

 That being said, I don't like the direction that kernel thread magic
 affinity is going.  It may be great for cache performance and reducing
 random bounding, but I have a scheduling-jitter-sensitive workload and
 I don't care about overall system throughput.  I need the kernel to
 stay the f!k off my important cpus, and arranging for this to happen
 is becoming increasingly complicated.

 Why is it becoming increasingly complicated?  The biggest change
 probably was the shared workqueue pool implementation but that was
 years ago and workqueue has grown pool attributes recently adding more
 properly designed flexibility and, for example, adding default
 affinity for !per-cpu workqueues should be pretty easy now.  But
 anyways, if it's an issue, it should be examined and properly solved
 rather than hacking up hacky solution with cgroup.

Because more things are becoming per cpu without the option of moving
of per-cpu things on behalf of one cpu to another cpu.  RCU is a nice
exception.


 cgroups are most certainly something that a binary can be aware of.
 It's not like a sysctl knob at all -- it's per process.  I have lots

 No, it definitely is not.  Sure it is more granular than sysctl but
 that's it.  It exposes control knobs which are directly tied into
 kernel implementation details.  It is not a properly designed
 programming API by any stretch of imagination.  It is an extreme
 failure on the kernel side that that part hasn't been made crystal
 clear from the beginning.  I don't know how intentional it was but the
 whole thing is completely botched.

 cgroup *never* was held to the standard necessary for any widely
 available API and many of the controls it exposes are exactly at the
 level of sysctls.  As the interface was filesystem, it could evade
 scrutiny and with the hierarchical organization also gave the
 impression that it's something which can be used directly by
 individual applications.  It found a loophole in the way we implement
 and police kernel APIs and then exploited it like there's no tomorrow.

 We are firmly bound to maintain what already has been exposed from the
 kernel side and I'm not gonna break any of them but the free-for-all
 cgroup is broken and deprecated.  It's gonna wither and fade away and
 any attempt to reverse that will be met with extreme prejudice.

The functionality I care about is that a program can reliably and
hierarchically subdivide system resources -- think rlimits but
actually useful.  I, and probably many other things, want this
functionality.  Yes, the current cgroup interface is awful, but it
gets one thing right: it's a hierarchy.

Back when my software ran on Windows, I used the awful job interface
to allocate resources among different parts of my software.  When I
switched to Linux, I lost some of that functionality and replaced
other bits with cgroups.  It's hackish, but it works.

Now we're apparently moving toward having a unified hierarchy
(great!), a more sane API (great!), and a nasty userspace situation
where systemd-using systems control the hierarchy through a highly
limiting systemd-specific interface and non-systemd systems do
something else which will presumably look nothing like what systemd
does.

I would argue that designing a kernel interface that requires exactly
one userspace component to manage it and ties that one userspace
component to something that can't easily be deployed everywhere (the
init system) is as big a cheat as the old approach of sneaking bad
APIs in through a filesystem was.

IOW, please, when designing this, please specify an API that programs
are permitted to use, and let that API be reviewed.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 12:37 PM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Mon, Jun 24, 2013 at 12:24:38PM -0700, Andy Lutomirski wrote:
 Because more things are becoming per cpu without the option of moving
 of per-cpu things on behalf of one cpu to another cpu.  RCU is a nice
 exception.

 Hmm... but in most cases it's per-cpu on the same cpu that initiated
 the task.  If a given CPU is just crunching numbers and IRQ affinity
 is properly configured, the CPU shouldn't be bothered too much by
 per-cpu work items.  If there are, please let us know.  We can hunt
 them down.

I'm not just crunching numbers -- I do (nonblocking) I/O as well.


 The functionality I care about is that a program can reliably and
 hierarchically subdivide system resources -- think rlimits but
 actually useful.  I, and probably many other things, want this
 functionality.  Yes, the current cgroup interface is awful, but it
 gets one thing right: it's a hierarchy.

 And the hierarchy support was completely broken for many resource
 controllers up until only several releases ago.

 I would argue that designing a kernel interface that requires exactly
 one userspace component to manage it and ties that one userspace
 component to something that can't easily be deployed everywhere (the
 init system) is as big a cheat as the old approach of sneaking bad
 APIs in through a filesystem was.

 In terms of API, it is firmly at the level of sysctl.  That's it.

 While I agree that having a proper kernel API for hierarchical
 resource management could be nice.  That currently is out of scope.
 We're already knee-deep in shit with the limited capabilities we're
 trying to implement.  Also, I really don't think cgroup is the right
 interface for such thing even if we get to that.  It should be part of
 the usual process/thread model, not this completely separate thing on
 the side.

 IOW, please, when designing this, please specify an API that programs
 are permitted to use, and let that API be reviewed.

 cgroup is not that API and it's never gonna be in all likelihood.  As
 for systemd vs. non-systemd compatibility, I'm afraid I don't have a
 good answer.  This is still all in a pretty earlly phase and the
 proper abstractions and APIs are being figured out.  Hopefully, we'll
 converge on a mostly compatible high-level abstraction which can be
 presented regardless of the actual base system implementation.


So what is cgroup for?  That is, what's the goal for what the new API
should be able to do?

AFAICT the main reason that systemd uses cgroup is to efficiently
track which service various processes came from and to send signals,
and it seems like that use case could be handled without cgroups at
all by creative use of subreapers and a syscall to broadcast a signal
to everything that has a given subreaper as an ancestor.  In that
case, systemd could be asked to stay away from cgroups even in the
single-hierarchy case.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 4:19 PM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Mon, Jun 24, 2013 at 04:01:07PM -0700, Andy Lutomirski wrote:
 So what is cgroup for?  That is, what's the goal for what the new API
 should be able to do?

 It is a for controlling and distributing resources.  That part doesn't
 change.  It's just not built to be used directly by individual
 applications.  It's an admin tool just like sysctl - be that admin be
 a human or userland base system.

 There's a huge chasm between something which can be generally used by
 normal applications and something which is restricted to admins and
 base systems in terms of interface generality and stability, security,
 how the abstractions fit together with the existing APIs and so on.
 cgroup firmly belongs to the former.  It still serves the same purpose
 but isn't, in a way, developed enough to be used directly by
 individual applications and I'm not even sure we want or need to
 develop it to such a level.

My application is running on a single-purpose system I administer.

I guess what I'm trying to say here is that many systems will rather
fundamentally use systemd.  Admins of those systems should still have
access to a reasonably large subset of cgroup functionality.  If the
single-hierarchy model is going to prevent going around systemd and if
systemd isn't going to expose all of the useful cgroup functionality,
then perhaps there should be a way to separate systemd's hierarchy
from the cgroup hierarchy.

Looking at http://0pointer.de/blog/projects/cgroups-vs-cgroups.html,
it looks like systemd doesn't actually need the cgroup resource
control functionality.  Maybe there's a way to disentangle this stuff.
 The /proc/pid/children feature that CRIU added seems like a decent
start.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 4:37 PM, Tejun Heo t...@kernel.org wrote:
 Hello, Andy.

 On Mon, Jun 24, 2013 at 04:27:17PM -0700, Andy Lutomirski wrote:
 I guess what I'm trying to say here is that many systems will rather
 fundamentally use systemd.  Admins of those systems should still have
 access to a reasonably large subset of cgroup functionality.  If the
 single-hierarchy model is going to prevent going around systemd and if
 systemd isn't going to expose all of the useful cgroup functionality,
 then perhaps there should be a way to separate systemd's hierarchy
 from the cgroup hierarchy.

 I don't think systemd will prevent you from buildling your own
 hierarchy on the side.  It sure won't be properly supported and things
 might break in corener cases / over time but if you're willing to take
 such risks anyway...  In the long term tho, what should happen
 probably is examining use cases like yours and then incorporating
 sensible mechanisms to support that into the base system
 infrastructure.  It might not be completely identical but I'm sure
 over time we'll be able to find what are the fundamental pieces and
 proper abstractions.  Right now, we're exposing way too much without
 even clearly understanding what are being enabled.  It is
 unsustainable.

Now I'm confused.  I thought that support for multiple hierarchies was
going away.  Is it here to stay after all?

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 4:40 PM, Tejun Heo t...@kernel.org wrote:
 Hello,

 On Mon, Jun 24, 2013 at 4:38 PM, Andy Lutomirski l...@amacapital.net wrote:
 Now I'm confused.  I thought that support for multiple hierarchies was
 going away.  Is it here to stay after all?

 It is going to be deprecated but also stay around for quite a while.
 That said, I didn' t mean to use multiple hierarchies. I was saying
 that if you build a sub-hierarchy in the unified hierarchy, you're
 likely to get away with it in most cases.

Isn't that exactly what I was originally asking for?  Quoting from
earlier in the thread:

On Mon, Jun 24, 2013 at 6:27 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Sat, 22.06.13 15:19, Andy Lutomirski (l...@amacapital.net) wrote:

 2. I manage services and tasks outside systemd (for one thing, I
 currently use Ubuntu, but even if I were on Fedora, I have a bunch
 of fine-grained things that figure out how they're supposed to
 allocate resources, and porting them to systemd just to keep working
 in the new world order would be a PITA [1]).


[...]


 I think that what I want are something like sub-unit cgroups -- I
 want to be able to ask systemd to further subdivide the group for my
 unit, login session, or whatever.  Would this be reasonable?
 (Another way of thinking of this is that a unit would have a whole
 cgroup hierarchy instead of just one cgroup.)

 The idea is not even to allow this. Basically, if you want to partitions
 your daemon into different cgroups you need to do that through systemd's
 abstractions: slices and services. To make this more palatable we'll
 introduce throw-away units though, so that you can dynamically run
 something as a workload and don't need to be concerned about naming
 this, or cleaning it up.


If I can subdivide my service in the hierarchy, then I'm happy.  If
this gets lost *and* systemd insists on controlling the one and only
cgroup hierarchy, then I think I have serious problems with the new
regime.

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [HEADSUP] cgroup changes

2013-06-24 Thread Andy Lutomirski
On Mon, Jun 24, 2013 at 4:57 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Mon, 24.06.13 16:01, Andy Lutomirski (l...@amacapital.net) wrote:

 AFAICT the main reason that systemd uses cgroup is to efficiently
 track which service various processes came from and to send signals,
 and it seems like that use case could be handled without cgroups at
 all by creative use of subreapers and a syscall to broadcast a signal
 to everything that has a given subreaper as an ancestor.  In that
 case, systemd could be asked to stay away from cgroups even in the
 single-hierarchy case.

 systemd uses cgroups to manage services. Managing services means many
 things. Among them: keeping track of processes, listing processes of a
 service, killing processes of a service, doing per-service logging
 (which means reliably, immediately, and race-freely tracing back
 messages to the service which logged them), about 55 other things, and
 also resource management.

 I don't see how I can do anything of this without something like
 cgroups, i.e. hierarchial, resource management involved systemd which
 allows me to securely put labels on processes.

Boneheaded straw-man proposal: two new syscalls and a few spare processes.

int sys_task_reaper(int tid): Returns the reaper for the task tid
(which is 1 if there's no subreaper).  (This could just as easily be a
file in /proc.)

int sys_killall_under_subreaper(int subreaper, int sig): Broadcasts
sig to all tasks under subreaper (excluding subreaper).  Guarantees
that, even if those tasks are forking, they all get the signal.

Then, when starting a service, systemd forks, sets the child to be a
subreaper, then forks that child again to exec the service.

Does this do everything that's needed?  sys_task_reaper is trivial to
implement (that functionality is already there in the reparenting
code), and sys_killall_under_subreaper is probably not so bad.


This has one main downside I can think of: it wastes a decent number
of processes (one subreaper per service).

--Andy
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel