On Sun, 17 Jul 2016, Alexei Starovoitov wrote:

On Sun, Jul 17, 2016 at 03:19:13AM -0700, Sargun Dhillon wrote:

+static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+       void *to = (void *) (long) r1;
+       void *from = (void *) (long) r2;
+       int  size = (int) r3;
+
+       /* check if we're in a user context */
+       if (unlikely(in_interrupt()))
+               return -EINVAL;
+       if (unlikely(!current->pid))
+               return -EINVAL;
+
+       return copy_to_user(to, from, size);
+}

thanks for the patch, unfortunately it's not that straightforward.
copy_to_user might fault. Try enabling CONFIG_DEBUG_ATOMIC_SLEEP and
you'll see the splat since bpf programs are protected by rcu.
Also 'current' can be null and I'm not sure what current->pid does.
So the writing to user memory either has to be verified to avoid
sleeping and faults or we need to use something like task_work_add
mechanism. Ideas are certainly welcome.


From casual inspection, I can't find where current can be null when
in_interrupt() is false. Although, we can check before dereferencing it. When not in a user context, the pid of the task struct returns 0.

As far as preventing sleep, would the following alteration do? Or do we actually need something more sophisticated?
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index be89c148..45878f3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -86,14 +86,19 @@ static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
        void *to = (void *) (long) r1;
        void *from = (void *) (long) r2;
        int  size = (int) r3;
+       struct task_struct *task = current;

        /* check if we're in a user context */
        if (unlikely(in_interrupt()))
                return -EINVAL;
-       if (unlikely(!current->pid))
+       if (unlikely(!task || !task->pid))
                return -EINVAL;

-       return copy_to_user(to, from, size);
+       /* Is this a user address, or a kernel address? */
+       if (!access_ok(VERIFY_WRITE, to, size))
+               return -EINVAL;
+
+       return probe_kernel_write(to, from, size);
 }

 static const struct bpf_func_proto bpf_copy_to_user_proto = {


probe_kernel_write doesn't block, and this will disallow BPF programs to write to kernel memory. This turns off the pagefault handler under the hood, unblocking us.

Reply via email to