Bug 470201 - (CVE-2008-5029) CVE-2008-5029 kernel: Unix sockets kernel panic

Status:	NEW

Product:	Security Response
Component:	vulnerability (Show Security Response/vulnerability bugs)
Version:	unspecified
Platform:	All Linux

Priority:	high Severity: high
Target Milestone:	---
Assigned To:	Red Hat Security Response Team
QA Contact:

URL:
Whiteboard:	impact=important,source=netdev,report...
Keywords:	Security

Depends on:	470429 470430 470431 470432 470433 470434 470435 470436
Blocks:	CVE-2008-5300
	Show dependency tree

Reported:	2008-11-06 04:07 EDT by Eugene Teo (Security Response Team)
Modified:	2009-01-13 04:16 EDT (History)

Fixed In Version:
Release Notes:
Clone Of:

Restrict Group Visibility:

Only users in any of the selected groups can view this bug:
(Unchecking all boxes makes this a more public bug.)

Current Groups: (edit)
None Set

Attachments
Reproducer - http://darkircop.org/unix.c (2.51 KB, text/plain) 2008-11-06 04:08 EDT, Eugene Teo (Security Response Team)	Details
potential fix for __scm_destroy() recursion (1.85 KB, patch) 2008-11-06 07:04 EDT, David Miller	Details \| Diff
Another reproducer - http://darkircop.org/unix2.c (2.67 KB, text/plain) 2008-11-07 06:22 EDT, Eugene Teo (Security Response Team)	Details
second part of fix (6.53 KB, patch) 2008-11-11 04:31 EDT, David Miller	Details \| Diff
Implementation of David's suggestion (2.23 KB, patch) 2008-11-25 15:24 EDT, dann frazier	Details \| Diff
Proposed patch for real-time kernel (2.49 KB, patch) 2008-11-27 01:00 EDT, Eugene Teo (Security Response Team)	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

External Bugs

Description >From Eugene Teo (Security Response Team) 2008-11-06 04:07:59 EDT

http://marc.info/?l=linux-netdev&m=122593044330973&w=2


"The following code causes a kernel panic on Linux 2.6.26:
http://darkircop.org/unix.c

I haven't investigated the bug so I'm not sure what is causing it, and don't
know if it's exploitable.  The code passes unix sockets from one process to
another using unix sockets.  The bug probably has to do with closing file
descriptors."

------- Comment #1 >From Eugene Teo (Security Response Team) 2008-11-06 04:08:41 EDT -------

Created an attachment (id=322676) [details]
Reproducer - http://darkircop.org/unix.c

------- Comment #3 >From David Miller 2008-11-06 07:04:03 EDT -------

Every Linux kernel is vulnerable to this as far as I can tell.

The problem is that __scm_destroy() can close a socket via fput()
which can lead back into __scm_destroy() and so on and so forth.

I'll attach the patch I'm currently testing, it's based upon a
suggested implementation from Linus.

------- Comment #4 >From David Miller 2008-11-06 07:04:40 EDT -------

Created an attachment (id=322702) [details]
potential fix for __scm_destroy() recursion

------- Comment #6 >From Eugene Teo (Security Response Team) 2008-11-07 01:26:40 EDT -------

I managed to reproduce the problem easily on:
kernel-rt-2.6.24.7-91.el5rt.i686
kernel-2.6.9-78.0.8.EL.i686

I had a little problem reproducing it on kernel-2.6.18-92.1.17.el5.i686, but a
while loop helps.

------- Comment #7 >From Eugene Teo (Security Response Team) 2008-11-07 06:22:42 EDT -------

Created an attachment (id=322845) [details]
Another reproducer - http://darkircop.org/unix2.c

------- Comment #9 >From Eugene Teo (Security Response Team) 2008-11-09 21:49:07 EDT -------

Upstream commits:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f8d570a
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3b53fbf

Luis, please ensure that the patch you added to -92 is the same one as
f8d570a/3b53fbf. Thanks.

------- Comment #10 >From Eugene Teo (Security Response Team) 2008-11-09 23:49:20 EDT -------

(In reply to comment #9)
> Upstream commits:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f8d570a
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3b53fbf

Dave, looks like Andrea is still seeing problems with this patch?

http://marc.info/?l=linux-netdev&m=122598444310928&w=2

Thanks, Eugene

------- Comment #13 >From David Miller 2008-11-11 04:31:21 EDT -------

Created an attachment (id=323161) [details]
second part of fix

As well as the __scm_destroy() recursion patch, this fix
for AF_UNIX garbage collection is needed to cure all of the
discovered problems.

------- Comment #14 >From David Miller 2008-11-11 04:32:01 EDT -------

Andrea's problems are fully resolved if the __scm_destroy() and
the AF_UNIX garbage collector patch are both applied.

------- Comment #16 >From Eugene Teo (Security Response Team) 2008-11-11 23:53:45 EDT -------

(In reply to comment #13)
> Created an attachment (id=323161) [details] [details]
> second part of fix
> 
> As well as the __scm_destroy() recursion patch, this fix
> for AF_UNIX garbage collection is needed to cure all of the
> discovered problems.

This is:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6209344

------- Comment #17 >From Neil Horman 2008-11-12 11:36:27 EDT -------

Note this is a prereq patch for the other 2:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=1fd05ba5a2f2aa8e7b9b52ef55df850e2e7d54c9

------- Comment #18 >From Neil Horman 2008-11-12 15:58:26 EDT -------

FWIW (should have done this earlier), I'm trying the test case on a 122.el5
kernel and its not crashing.  sendmsg always fails with an -EPIPE (which is
odd, given that it was created with socketpair).  Investigating as to why

------- Comment #19 >From Neil Horman 2008-11-12 16:05:36 EDT -------

scartch that, it just took several tries to get it to lock up the system.

------- Comment #20 >From Eugene Teo (Security Response Team) 2008-11-21 21:05:01 EDT -------

From dann frazier in oss-security list:

"Thanks for following up.

fyi, our testing of this fix has uncovered additional issues.
Local/unprivileged users can cause soft lockups and take out system
processes by triggering the OOM killer:
 http://marc.info/?l=linux-netdev&m=122721862313564&w=2"

Dave, take note.

------- Comment #22 >From Eugene Teo (Security Response Team) 2008-11-21 21:50:04 EDT -------

(In reply to comment #20)
> From dann frazier in oss-security list:
> 
> "Thanks for following up.
> 
> fyi, our testing of this fix has uncovered additional issues.
> Local/unprivileged users can cause soft lockups and take out system
> processes by triggering the OOM killer:
>  http://marc.info/?l=linux-netdev&m=122721862313564&w=2"

Bug reported at:
http://marc.info/?l=linux-netdev&m=122721862313564&w=2

------- Comment #23 >From Eugene Teo (Security Response Team) 2008-11-24 22:42:37 EDT -------

I tested 2.6.24.7-94.el5rt x86_64 by running unix or unix2 in a loop. It can
invoke the oom-killer pretty quickly, but I did not see the soft lockups that
Dann observed. Dave, any comments?

---
master invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0
Pid: 1798, comm: master Not tainted 2.6.24.7-94.el5rt #1

Call Trace:
 [<ffffffff81087cca>] out_of_memory+0x9d/0x2cb
 [<ffffffff8108acd5>] __alloc_pages+0x27d/0x312
 [<ffffffff810a3a44>] alloc_page_vma+0xb7/0xc6
 [<ffffffff8109e36c>] read_swap_cache_async+0x4f/0x103
 [<ffffffff81093d45>] swapin_readahead+0x61/0xcd
 [<ffffffff810952c8>] handle_mm_fault+0x408/0x764
 [<ffffffff81289ec0>] do_page_fault+0x3ba/0x76d
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff812882d9>] error_exit+0x0/0x51
 [<ffffffff8113c7fd>] ? copy_user_generic_string+0x2d/0x40
 [<ffffffff810bde13>] ? core_sys_select+0x200/0x275
 [<ffffffff81056cd4>] ? getnstimeofday+0x31/0x88
 [<ffffffff8113a2d0>] ? rb_insert_color+0x68/0xe3
 [<ffffffff81041b34>] ? timespec_add_safe+0x37/0x64
 [<ffffffff8105401e>] ? enqueue_hrtimer+0xda/0xe8
 [<ffffffff81054c41>] ? ktime_get_ts+0x46/0x4b
 [<ffffffff810be03f>] ? sys_select+0x7e/0xa6
 [<ffffffff8100c22e>] ? system_call_ret+0x0/0x5

Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:  
0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd: 161   Cold: hi:   62, btch:  15 usd: 
56
Active:9 inactive:32 dirty:0 writeback:0 unstable:0
 free:1174 slab:122808 mapped:1 pagetables:377 bounce:0
Node 0 DMA free:1988kB min:52kB low:64kB high:76kB active:0kB inactive:0kB
present:9696kB pages_scanned:0 al
l_unreclaimable? yes
lowmem_reserve[]: 0 484 484 484
Node 0 DMA32 free:2708kB min:2788kB low:3484kB high:4180kB active:156kB
inactive:0kB present:495940kB pages_
scanned:174218 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB
0*2048kB 0*4096kB = 1988kB
Node 0 DMA32: 17*4kB 0*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB
0*1024kB 1*2048kB 0*4096kB = 2708kB
Swap cache: add 1952, delete 1952, find 4/6, race 0+0
Free swap  = 1040760kB
Total swap = 1048568kB
master invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0
Pid: 1798, comm: master Not tainted 2.6.24.7-94.el5rt #1

Call Trace:
 [<ffffffff8108782e>] oom_kill_process+0x58/0xfe
 [<ffffffff81087e58>] out_of_memory+0x22b/0x2cb
 [<ffffffff8108acd5>] __alloc_pages+0x27d/0x312
 [<ffffffff810a3a44>] alloc_page_vma+0xb7/0xc6
 [<ffffffff8109e36c>] read_swap_cache_async+0x4f/0x103
 [<ffffffff81093d45>] swapin_readahead+0x61/0xcd
 [<ffffffff810952c8>] handle_mm_fault+0x408/0x764
 [<ffffffff81289ec0>] do_page_fault+0x3ba/0x76d
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff810336d4>] ? default_wake_function+0x0/0x14
 [<ffffffff812882d9>] error_exit+0x0/0x51
 [<ffffffff8113c7fd>] ? copy_user_generic_string+0x2d/0x40
 [<ffffffff810bde13>] ? core_sys_select+0x200/0x275
 [<ffffffff81056cd4>] ? getnstimeofday+0x31/0x88
 [<ffffffff8113a2d0>] ? rb_insert_color+0x68/0xe3
 [<ffffffff81041b34>] ? timespec_add_safe+0x37/0x64
 [<ffffffff8105401e>] ? enqueue_hrtimer+0xda/0xe8
 [<ffffffff81054c41>] ? ktime_get_ts+0x46/0x4b
 [<ffffffff810be03f>] ? sys_select+0x7e/0xa6
 [<ffffffff8100c22e>] ? system_call_ret+0x0/0x5

Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:  
0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd: 164   Cold: hi:   62, btch:  15 usd: 
59
Active:9 inactive:32 dirty:0 writeback:0 unstable:0
 free:1188 slab:122759 mapped:1 pagetables:377 bounce:0
Node 0 DMA free:1988kB min:52kB low:64kB high:76kB active:0kB inactive:0kB
present:9696kB pages_scanned:0 al
l_unreclaimable? yes
lowmem_reserve[]: 0 484 484 484
Node 0 DMA32 free:2764kB min:2788kB low:3484kB high:4180kB active:156kB
inactive:0kB present:495940kB pages_
scanned:622 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB
0*2048kB 0*4096kB = 1988kB
Node 0 DMA32: 15*4kB 3*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB
0*1024kB 1*2048kB 0*4096kB = 2772kB
Swap cache: add 1968, delete 1968, find 5/8, race 0+0
Free swap  = 1040760kB
Total swap = 1048568kB
[...]

------- Comment #24 >From David Miller 2008-11-25 05:58:08 EDT -------

I had never seen the OOM killer triggers, but rather I did see that
the program could get stuck but be killable still by Ctrl-C.

The problem is that the child processes can still queue new
FDs over the AF_UNIX socket to the parents side, while the
parent is exit()'ing and (via exit time FD closing) running
UNIX garbage collection on those FDs.

There is no easy way at all to fix this.  There isn't something
like a one-to-one relationship between sockets and processes,
there is rather potentially a many-to-one relationship.  So ideas
like "don't allow sending FD over AF_UNIX socket for process that
is exit()'ing" are totally out of the question.

One idea that might work, however, is to throttle when UNIX garbage
collection is in progress.  I can't say how easy the implementation
would be.

The following might work:

1) Add wait_queue to net/unix/garbage.c
2) Create a helper function that sleeps until gc_in_progress is false
3) At the end of unix_gc() where gc_in_progress is cleared to false,
   perform a wakeup on the waitq added in #1
4) At all net/unix/af_unix.c calls of scm_send(), first invoke the
   "wait until gc_in_progress==false" thing added in #3

This should make sendmsg()'s block while any UNIX garbage collection
is in progress.  Note that this will kill scalability in the case where
many UNIX sockets are being closed while many other UNIX sockets are
doing SCM fp passing.

I don't know how common that is, probably not enough to care.

------- Comment #25 >From dann frazier 2008-11-25 15:24:33 EDT -------

Created an attachment (id=324662) [details]
Implementation of David's suggestion

Here's my attempt at implementing David's suggestion. I've been running this
for an hour or so now and haven't had a soft lockup or oom-killer trigger yet.

------- Comment #26 >From David Miller 2008-11-25 17:25:57 EDT -------

Patch looks mostly fine, could you please post this to netdev
with proper commit message and signoff?

I'd like to get this fixed upstream.

Thanks Dann.

------- Comment #27 >From dann frazier 2008-11-25 18:30:38 EDT -------

Sent:
 http://marc.info/?l=linux-netdev&m=122765505415944&w=2

------- Comment #28 >From Eugene Teo (Security Response Team) 2008-11-26 20:14:56 EDT -------

(In reply to comment #27)
> Sent:
>  http://marc.info/?l=linux-netdev&m=122765505415944&w=2

Updated patch:
http://marc.info/?l=linux-netdev&m=122771908731133&w=2

------- Comment #33 >From Eugene Teo (Security Response Team) 2008-11-27 08:06:54 EDT -------

(In reply to comment #28)
> (In reply to comment #27)
> > Sent:
> >  http://marc.info/?l=linux-netdev&m=122765505415944&w=2
> 
> Updated patch:
> http://marc.info/?l=linux-netdev&m=122771908731133&w=2

This is a different bug triggered by the same reproducers. I have filed a new
bug for this. Please refer to bug 473259. Thanks.

------- Comment #34 >From Jan Lieskovsky 2008-12-09 10:59:04 EDT -------

Debian mention of this issue:

http://security-tracker.debian.net/tracker/CVE-2008-5029

------- Comment #35 >From Eugene Teo (Security Response Team) 2009-01-05 01:05:16 EDT -------

A user posted an exploit[1] to bugtraq last Friday. It is the same reproducer
as the one posted in comment #1. SecurityFocus listed it as a new vulnerability
-- Linux Kernel Malformed 'msghdr' Structure Local Denial of Service[2]. This
is incorrect, and it should be CVE-2008-5029. Take note.

[1] http://seclists.org/bugtraq/2009/Jan/0000.html
[2] http://www.securityfocus.com/bid/33079/info

[linuxkernelnewbies] Bug 470201 – CVE-2008-5029 ke rnel: Unix sockets kernel panic

Reply via email to