On Wed, Oct 31, 2018 at 12:44 PM, Oleg Nesterov <o...@redhat.com> wrote: > On 10/30, Eric W. Biederman wrote: >> >> At a bare minimum you need to perform the permission check using the >> credentials of the opener of the file. Which means refactoring >> kill_pid so that you can perform the permission check for killing the >> application during open.
Absolutely right. Thanks for spotting that. > perhaps it would be simpler to do > > my_cred = override_creds(file->f_cred); > kill_pid(...); > revert_creds(my_cred); Thanks for the suggestion. That looks neat, but it's not quite enough. The problem is that check_kill_permission looks for same_thread_group(current, t) _before_ checking kill_of_by_cred, so with just this code snippet, it'd still be possible for an unprivileged process to open /proc/$PRIVILEGED_PID/kill and hand the FD to $PRIVILEGED_PID, which would write to it and kill itself or any of its threads. I think, with some rearrangement of permissions checks, this problem can be overcome. There's another problem though: say we open /proc/pid/5/kill *, with proc 5 being an ordinary unprivileged process, e.g., the shell. At open(2) time, the access check passes. Now suppose PID 5 execve(2)s into a setuid process. The kill FD is still open, so the kill FD's holder can send a signal to a process it normally wouldn't be able to kill. You might say, "let's somehow invalidate open kill FDs upon setuid exec", but the problem that then results is then that a legitimate privileged user of /proc/pid/kill (say, Android lmkd) might see its /proc/pid/kill handle spuriously become invalidated if the process to which it refers execs in a setuid way. Maybe in this case we make could write(2) on the kill FD fail with ECHILD ("no child process"?) and have callers, if they see ECHILD, close the kill FD, re-open it, and try again. But now we're getting into an interface that's too complicated to use from the shell. Maybe a simpler approach would be to bind the kill FD to the struct cred that opened it and keep the access check in write(2) --- a write(2) with current->cred not equal to f_cred would fail with EPERM. This way, you could play standard-output-of-setuid-program or SCM_RIGHTS games with the kill FD itself, but you wouldn't be able to do anything with the FD having done so. Honestly, though, maybe a new procfs_sigqueue(2) system call would be simpler and more robust. With a single system call, we wouldn't split the permissions check between open(2) and write(2), and so the whole problem disappears. The downside is that you wouldn't be able to use the new feature via the shell without a helper program. :-( What do you think? * I actually have a local variant of the patch that would have you open "/proc/$PID/kill/$SIGNO" instead, since different signal numbers have different permission checks. This approach is kind of neat, since /proc/pid/kill/$SIGNO would act as an "option" to kill a process only with a particular signal, and a write(2) to a /proc/$PID/kill/$SIGNO file would allow you to specify a sigqueue(2)-style siginfo value along with the actual signal number (since the signal number is encoded in the filename). For example, a privileged process could open /proc/self/kill/10 (SIGUSR1) and hand the FD to an unprivileged process, letting that process signal (via signal) completion of some process without giving that unprivileged process the ability to send *any* signal to the privileged process. But eventfd is almost certainly a better choice in this situation anyway, I think.