Exploiting races in system call wrappers

By Jake Edge
August 15, 2007

A technique that is often used by security software, and has historically been a source of security holes, has once again been shown to be exploitable on many systems. Research recently presented by Robert N.M. Watson at the USENIX Workshop on Offensive Technologies (WOOT07) demonstrates race conditions in software that uses "system call wrapping" (or "hooking"). The race conditions can be exploited to circumvent the protections that the software is supposed to provide. Well behaved Linux software is not vulnerable, but other free operating systems do allow, and even encourage, the practice.

There are several different ways to implement wrappers, but at the core, they are kernel code that intercepts system calls from all applications, running their own code before and after the real system call. The wrapper code can see and modify all of the arguments being passed to and from the system call. This technique can be used to enforce various policies on the use of the system calls, denying or sharply restricting access. Logging, for audit trail purposes, all system call activity is another way the wrappers could be used.

Anti-virus or intrusion detection and prevention are the kinds of applications that use system call wrapping. Intercepting all calls to open(), for example, checking the file for viruses or illegal access and if so, returning an error, are the kinds of tasks that system call wrappers are used for. Notable users of system call wrappers are the OpenBSD and NetBSD Systrace facility, the Generic Software Wrappers Toolkit and the CerbNG firewall for FreeBSD.

Thus, intercepting system calls is a technique that is useful, but not without hazards. These recent vulnerabilities are endemic to the technique, not tied to a specific implementation. They exploit that bugaboo of system programmers everywhere: the race condition. Specifically, they are time-of-check-to-time-of-use (TOCTTOU) or other, similar, bugs.

A TOCTTOU exploit abuses the gap in time between the test for a condition and the use of an object that passes the test. If the object is changed in that gap, the restrictions that were supposed to be enforced by the test can be bypassed. The classic example is a setuid() program that tests a file for legal access by the real user before opening it. If the user replaces the file with a symlink to a file they can't legally access after the test, but before the open(), they have circumvented the security check.

Two similar race conditions have been identified for applications using system call wrappers: time-of-audit-to-time-of-use (TOATTOU) and time-of-replacement-to-time-of-use (TORTTOU). In both cases, the data that gets passed to the system call is manipulated. For TOATTOU, it is done to obscure the data from any auditing or logging that might be done, covering the tracks of an exploit from an intrusion detection application for example. In the TORTTOU case, if the data passed into the system call is changed by the wrapper, to implement "jail" functionality for instance, the exploit changes it back before the system call is made.

In his paper, "Exploiting Concurrency Vulnerabilities in System Call Wrappers" (PDF), Watson shows techniques to reliably exploit the race conditions in a variety of packages that use system call wrappers. On both single and multi-processor systems, mechanisms were found to exploit the time gap �C because system calls, especially with wrappers, are not atomic operations.

For single processor systems, one of his examples used data that had its last byte on a swapped-out page. While the kernel is sleeping, awaiting the page to be swapped in, another process can change the data that has already been read. For multiprocessor systems, the windows are typically smaller, but it is not necessary to arrange for the kernel to sleep, a thread on a different processor can be used to alter the data. The main problem in that case is synchronizing with the kernel process so that the exploit knows when to change the data. Watson found several synchronization methods, one very simple one just spins waiting for the data to change and changes it back, effecting a TORTTOU exploit.

For these and other reasons, Linux does not export its system call table and actively discourages programmers from taking this approach. There are no real solutions to the problems Watson has identified unless the system call wrapping technique is abandoned. The two solutions he has suggested are either moving to a "message passing" architecture for system calls or to integrate the security checks into the kernel itself. He specifically mentions the Linux Security Modules approach as one that alleviates the system call wrapper race.

It is unfortunate that there are still many uses of system call wrapping in today's free operating systems. While the specific problems that Watson describes may not have been known, wrappers as a source of security bugs certainly have been. It is a seductive technique, one that seems simple to implement and foolproof, but it is clearly fraught with peril. The BSD family needs to find other ways to implement their security applications as do any Linux vendors who have ignored the kernel developers and continued to use the wrapping technique.

Please educate a curious cat

Posted Aug 15, 2007 23:27 UTC (Wed) by felixfix (subscriber, #242) [Link]

I understand what is going on; a pointer or some other piece of user data is changed by a user program, in a different thread probably, between validation and use.

I haven't written this kind of code; the last OS work I did passed all syscall parameters in registers. But I am a bit confused. Wouldn't it be very simple to avoid these race conditions by copying the user data to kernel memory before validating? Obviously this wouldn't work with the infamous setuid switcheroo, but for syscall parameters, it would seem to work very well. The only case I can think of to make it difficult would be where the user data in question is too large for easy copying to kernel memory.

Please educate a curious cat

Posted Aug 16, 2007 7:27 UTC (Thu) by kleptog (subscriber, #1183) [Link]

I think the point is that the system call wrapping was supposed to be cheap and quick, hence the wanting to avoid copying the data twice. The wrapper gets the data exactly the same way as the system call.

What you suggest (copying data then checking) is I think pretty much what the LSM do. Rather than just wrapping the system call, it gets called *after* the kernel has copied it to kernel space. This it's safer, but not as easy to write...

Please educate a curious cat

Posted Aug 16, 2007 7:52 UTC (Thu) by kilpatds (subscriber, #29339) [Link]

As one of the authors of GSWTK.... (We knew about the issue. GSWTK was
a research project, not intended for production use)

We stopped working on it before the vsyscall method was adopted, so
please limit these comments to the software interrupt syscall method.

Someone has to copy in all "complex" data into kernel space before
operating on it. In linux, that is done by the sys_* methods that
implement the system calls.

GSWTK Wrappers replace the system call vector, so they are called before
the sys_* call. So the wrapper has to copy in the data to analyze it,
but has to hand the original system call a pointer that can be copied in.
That is, a pointer in user space.

We could copy the data to somewhere else in userspace (a page we allocate
in their process space), make that page not writable by the program, and
pass that pointer in. But this is just a band-aid. The process could
reset the flags on the page and change the data. It just shrinks the
race period. It doesn't fix the fundamental race.

If one were a kernel developer who wanted to support wrapper-like
interposition, you could add a layer to enable it. The base system call
would copy data in, then call the method that actually implemented the
logic. This would provide an alternate interposition point. But it
would slow everyone else down. I can't imagine such a feature making it
in.

Doug

Correct - the approaches work fine when race conditions are eliminated

Posted Aug 16, 2007 8:35 UTC (Thu) by dwheeler (subscriber, #1216) [Link]

Correct; the attacks ONLY work if the design permits race conditions. The notion that user-space data will stay unchanged during a kernel call is untrue is practically all of today's OSs, and this attack worked in the 1960s and 1970s too (it's well-documented). The solutions are well-documented, too; eliminate the race condition. The "easy" way is to copy all data into the kernel, and then use that protected version. The trick is to get good performance as well.

Correct - the approaches work fine when race conditions are eliminated

Posted Aug 23, 2007 1:13 UTC (Thu) by Cato (subscriber, #7643) [Link]

Indeed - this model of 'copy first then check' was known as 'touch once programming' over 20 years ago, so there's little excuse for repeating this mistake again. Perhaps what's needed is smarter static analysis tools that can point out this sort of error?

Getting good performance is a challenge, but with the speed of modern CPUs I'd rather spend some CPU cycles on copying than spend many administrator hours responding to a security breach.

Exploiting races in system call wrappers

Posted Aug 16, 2007 12:00 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

Well, I can see why the "check inside the kernel" method works. But, I'm curious, how would this "message-passing" system work, that would eliminate the races?

Exploiting races in system call wrappers

Posted Aug 16, 2007 14:06 UTC (Thu) by ms (subscriber, #41272) [Link]

Well if you're using message passing, then once you've sent the message, you no longer have access to the data. This is hinting at what the real problem is: shared memory.

Erlang, with its message passing, can't ever have this sort of problem. Now of course, to make the implementation run fast, it uses shared memory under the bonnet, but this is never exposed to the programmer, so it's safe (err, well...). Other programming paradigms would also be able to protect programmers from this sort of issue - STM and other transactional memory systems may have ideas to contribute in this area too.

Exploiting races in system call wrappers

Posted Aug 16, 2007 16:04 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

Sounds rather like the old "safety vs. performance" issue. Joy.

I can think of ways to make shared memory safe, in general, but most of them involve either locking critical sections, or using some kind of multiversion concurrency control, like many DBMSes do. Either one is going to cost.

In the "TOCTTOU" case, I suppose locking the "check and use" section of the code somehow, so that no other processes could access the resource being checked, would work, but again, performance hit. And complicated. And I might be wrong anyway, and that doesn't work after all.

[linuxkernelnewbies] Exploiting races in system call wrappers [LWN.net]

Exploiting races in system call wrappers

Reply via email to