"The following code causes a kernel panic on Linux 2.6.26:
http://darkircop.org/unix.c
I haven't investigated the bug so I'm not sure what is causing it, and don't
know if it's exploitable. The code passes unix sockets from one process to
another using unix sockets. The bug probably has to do with closing file
descriptors."
-------Comment #3
>From David Miller 2008-11-06 07:04:03 EDT-------
Every Linux kernel is vulnerable to this as far as I can tell.
The problem is that __scm_destroy() can close a socket via fput()
which can lead back into __scm_destroy() and so on and so forth.
I'll attach the patch I'm currently testing, it's based upon a
suggested implementation from Linus.
-------Comment #4
>From David Miller 2008-11-06 07:04:40 EDT-------
I managed to reproduce the problem easily on:
kernel-rt-2.6.24.7-91.el5rt.i686
kernel-2.6.9-78.0.8.EL.i686
I had a little problem reproducing it on kernel-2.6.18-92.1.17.el5.i686, but a
while loop helps.
-------Comment #13
>From David Miller 2008-11-11 04:31:21 EDT-------
Created an attachment (id=323161)[details]
second part of fix
As well as the __scm_destroy() recursion patch, this fix
for AF_UNIX garbage collection is needed to cure all of the
discovered problems.
-------Comment #14
>From David Miller 2008-11-11 04:32:01 EDT-------
Andrea's problems are fully resolved if the __scm_destroy() and
the AF_UNIX garbage collector patch are both applied.
-------Comment #18
>From Neil Horman 2008-11-12 15:58:26 EDT-------
FWIW (should have done this earlier), I'm trying the test case on a 122.el5
kernel and its not crashing. sendmsg always fails with an -EPIPE (which is
odd, given that it was created with socketpair). Investigating as to why
-------Comment #19
>From Neil Horman 2008-11-12 16:05:36 EDT-------
scartch that, it just took several tries to get it to lock up the system.
From dann frazier in oss-security list:
"Thanks for following up.
fyi, our testing of this fix has uncovered additional issues.
Local/unprivileged users can cause soft lockups and take out system
processes by triggering the OOM killer:
http://marc.info/?l=linux-netdev&m=122721862313564&w=2"
Dave, take note.
-------Comment #24
>From David Miller 2008-11-25 05:58:08 EDT-------
I had never seen the OOM killer triggers, but rather I did see that
the program could get stuck but be killable still by Ctrl-C.
The problem is that the child processes can still queue new
FDs over the AF_UNIX socket to the parents side, while the
parent is exit()'ing and (via exit time FD closing) running
UNIX garbage collection on those FDs.
There is no easy way at all to fix this. There isn't something
like a one-to-one relationship between sockets and processes,
there is rather potentially a many-to-one relationship. So ideas
like "don't allow sending FD over AF_UNIX socket for process that
is exit()'ing" are totally out of the question.
One idea that might work, however, is to throttle when UNIX garbage
collection is in progress. I can't say how easy the implementation
would be.
The following might work:
1) Add wait_queue to net/unix/garbage.c
2) Create a helper function that sleeps until gc_in_progress is false
3) At the end of unix_gc() where gc_in_progress is cleared to false,
perform a wakeup on the waitq added in #1
4) At all net/unix/af_unix.c calls of scm_send(), first invoke the
"wait until gc_in_progress==false" thing added in #3
This should make sendmsg()'s block while any UNIX garbage collection
is in progress. Note that this will kill scalability in the case where
many UNIX sockets are being closed while many other UNIX sockets are
doing SCM fp passing.
I don't know how common that is, probably not enough to care.
-------Comment #25
>From dann frazier 2008-11-25 15:24:33 EDT-------
Created an attachment (id=324662)[details]
Implementation of David's suggestion
Here's my attempt at implementing David's suggestion. I've been running this
for an hour or so now and haven't had a soft lockup or oom-killer trigger yet.
-------Comment #26
>From David Miller 2008-11-25 17:25:57 EDT-------
Patch looks mostly fine, could you please post this to netdev
with proper commit message and signoff?
I'd like to get this fixed upstream.
Thanks Dann.
-------Comment #27
>From dann frazier 2008-11-25 18:30:38 EDT-------
Created an attachment (id=322702) [details] potential fix for __scm_destroy() recursionCreated an attachment (id=323161) [details] second part of fix As well as the __scm_destroy() recursion patch, this fix for AF_UNIX garbage collection is needed to cure all of the discovered problems.Created an attachment (id=324662) [details] Implementation of David's suggestion Here's my attempt at implementing David's suggestion. I've been running this for an hour or so now and haven't had a soft lockup or oom-killer trigger yet.