So here is my RFC as an alternative. Thoughts? Please note that we
currently have only very few users of use_mm() API in the kernel
so a risk of a regression is not really high. usb/gadget are using it
only temporarily. The remaining is vhost which operates on a remote mm
and I have no idea whether somebody might abuse /proc/vhost/mem or
anything - let's add Michael to the CC list. I am pretty sure nobody
abuse oom_reaper proc directory as this one is pretty new and such a
usage would be pretty much undefined as the reaper unmaps the address
space.
---
>From 6a1a9fca2871ada365b465382a3f89a1506c312d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mho...@suse.com>
Date: Wed, 19 Oct 2016 19:02:25 +0200
Subject: [PATCH] proc: do not allow to open file requiring mm for kernel
 threads

Leon Yu has reported the following NULL ptr oops
$ cat /proc/2/auxv
[    8.964445] Unable to handle kernel NULL pointer dereference at virtual 
address 000000a8
[    8.972555] pgd = e99e0000
[    8.975282] [000000a8] *pgd=399e6835, *pte=00000000, *ppte=00000000
[    8.981572] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[    8.986967] Modules linked in:
[    8.990029] CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
[    8.996379] Hardware name: BCM2709
[    8.999778] task: ea3b0b00 task.stack: e99b2000
[    9.004314] PC is at auxv_read+0x24/0x4c
[    9.008241] LR is at do_readv_writev+0x2fc/0x37c
[...]
[    9.261895] [<b0135f80>] (auxv_read) from [<b00e5900>] 
(do_readv_writev+0x2fc/0x37c)
[    9.269651] [<b00e5900>] (do_readv_writev) from [<b00e59c0>] 
(vfs_readv+0x40/0x58)
[    9.277234] [<b00e59c0>] (vfs_readv) from [<b010eed4>] 
(default_file_splice_read+0x140/0x240)
[    9.285769] [<b010eed4>] (default_file_splice_read) from [<b010eb4c>] 
(splice_direct_to_actor+0xa0/0x230)
[    9.295345] [<b010eb4c>] (splice_direct_to_actor) from [<b010ed6c>] 
(do_splice_direct+0x90/0xb8)
[    9.304140] [<b010ed6c>] (do_splice_direct) from [<b00e5e38>] 
(do_sendfile+0x1a0/0x308)
[    9.319823] [<b00e67d4>] (SyS_sendfile64) from [<b000f300>] 
(ret_fast_syscall+0x0/0x34)
[    9.327829] Code: e1a01002 e1a02003 e1a03004 e2833008 (e593e0a0)
[    9.333973] ---[ end trace d3f50081f24b99ce ]---

This has been introduced by c5317167854e ("proc: switch auxv to use
of __mem_open()") but it shows a deeper problem we have had for a
longer time. __mem_open resp. proc_mem_open allows to open a file which
requires the address space even when there is none. This means that all
kernel code paths which use __mem_open have to check for mm!=NULL. This
is error prone as the above shows and also doesn't make much sense in
general. A task doesn't have an mm even when it is already exiting or
when it is a kernel thread. The later will not have an mm unless it
hijacks it from another task when the output might be really misleading
without a deeper knowledge of the particular kernel thread.

Chane proc_mem_open to disallow opening a file if the mm is NULL and
return ESRCH. This is an user visible change theoretically but every
user other than /proc/self/$file has to cope with ESRCH already and
relying on kthread giving any output is just an abuse of the interface.

This will make users f proc_mem_open less error prone as well.

Fixes: c5317167854e ("proc: switch auxv to use of __mem_open()")
Reported-by: Leon Yu <chianglun...@gmail.com>
Signed-off-by: Michal Hocko <mho...@suse.com>
---
 fs/proc/base.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 5ef2ae81f623..d34d33dbf1b2 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -804,10 +804,10 @@ static const struct file_operations 
proc_single_file_operations = {
 struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
 {
        struct task_struct *task = get_proc_task(inode);
-       struct mm_struct *mm = ERR_PTR(-ESRCH);
+       struct mm_struct *ret = ERR_PTR(-ESRCH);
 
        if (task) {
-               mm = mm_access(task, mode | PTRACE_MODE_FSCREDS);
+               struct mm_struct *mm = mm_access(task, mode | 
PTRACE_MODE_FSCREDS);
                put_task_struct(task);
 
                if (!IS_ERR_OR_NULL(mm)) {
@@ -815,10 +815,11 @@ struct mm_struct *proc_mem_open(struct inode *inode, 
unsigned int mode)
                        atomic_inc(&mm->mm_count);
                        /* but do not pin its memory */
                        mmput(mm);
+                       ret = mm;
                }
        }
 
-       return mm;
+       return ret;
 }
 
 static int __mem_open(struct inode *inode, struct file *file, unsigned int 
mode)
-- 
2.9.3

-- 
Michal Hocko
SUSE Labs

Reply via email to