On Fri, Jul 22, 2016 at 11:53:52AM +0200, Daniel Borkmann wrote:
> On 07/22/2016 04:14 AM, Alexei Starovoitov wrote:
> >On Thu, Jul 21, 2016 at 06:09:17PM -0700, Sargun Dhillon wrote:
> >>This allows user memory to be written to during the course of a kprobe.
> >>It shouldn't be used to implement any kind of security mechanism
> >>because of TOC-TOU attacks, but rather to debug, divert, and
> >>manipulate execution of semi-cooperative processes.
> >>
> >>Although it uses probe_kernel_write, we limit the address space
> >>the probe can write into by checking the space with access_ok.
> >>This is so the call doesn't sleep.
> >>
> >>Given this feature is experimental, and has the risk of crashing
> >>the system, we print a warning on invocation.
> >>
> >>It was tested with the tracex7 program on x86-64.
> >>
> >>Signed-off-by: Sargun Dhillon <sar...@sargun.me>
> >>Cc: Alexei Starovoitov <a...@kernel.org>
> >>Cc: Daniel Borkmann <dan...@iogearbox.net>
> >>---
> >>  include/uapi/linux/bpf.h  | 12 ++++++++++++
> >>  kernel/bpf/verifier.c     |  9 +++++++++
> >>  kernel/trace/bpf_trace.c  | 37 +++++++++++++++++++++++++++++++++++++
> >>  samples/bpf/bpf_helpers.h |  2 ++
> >>  4 files changed, 60 insertions(+)
> >>
> >>diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>index 2b7076f..4536282 100644
> >>--- a/include/uapi/linux/bpf.h
> >>+++ b/include/uapi/linux/bpf.h
> >>@@ -365,6 +365,18 @@ enum bpf_func_id {
> >>     */
> >>    BPF_FUNC_get_current_task,
> >>
> >>+   /**
> >>+    * bpf_probe_write(void *dst, void *src, int len)
> >>+    * safely attempt to write to a location
> >>+    * @dst: destination address in userspace
> >>+    * @src: source address on stack
> >>+    * @len: number of bytes to copy
> >>+    * Return:
> >>+    *   Returns number of bytes that could not be copied.
> >>+    *   On success, this will be zero
> >
> >that is odd comment.
> >there are only three possible return values 0, -EFAULT, -EPERM
> 
> Agree.
> 
> >>+    */
> >>+   BPF_FUNC_probe_write,
> >>+
> >>    __BPF_FUNC_MAX_ID,
> >>  };
> >>
> >>diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >>index f72f23b..6785008 100644
> >>--- a/kernel/bpf/verifier.c
> >>+++ b/kernel/bpf/verifier.c
> >>@@ -1154,6 +1154,15 @@ static int check_call(struct verifier_env *env, int 
> >>func_id)
> >>            return -EINVAL;
> >>    }
> >>
> >>+   if (func_id == BPF_FUNC_probe_write) {
> >>+           
> >>pr_warn_once("************************************************\n");
> >>+           pr_warn_once("* bpf_probe_write: Experimental Feature in use 
> >>*\n");
> >>+           pr_warn_once("* bpf_probe_write: Feature may corrupt memory  
> >>*\n");
> >>+           
> >>pr_warn_once("************************************************\n");
> >>+           pr_notice_ratelimited("bpf_probe_write in use by: %.16s-%d",
> >>+                                 current->comm, task_pid_nr(current));
> >>+   }
> >
> >I think single line pr_notice_ratelimited() with 'feature may corrupt user 
> >memory'
> >will be enough.
> 
> Agree.
> 
> >Also please move this to tracing specific part into bpf_trace.c
> >similar to bpf_get_trace_printk_proto() instead of verifier.c
> 
> Yes, sorry for not being too clear about it, this spot will then be
> called by the verifier to fetch it for every function call. Meaning that
> this could get printed multiple times for loading a single program, but
> I think it's okay. If single line, I'd make that pr_warn_ratelimited(),
> and probably something like ...

yes, but inside check_call() it may be printed even more times, since the same
call site can be verified multiple times due to different reg types.

>  "note: %s[%d] is installing a program with bpf_probe_write helper that may 
> corrupt user memory!",
>  current->comm, task_pid_nr(current)

sounds good to me.

> 
> You can make that 'current->flags & (PF_KTHREAD | PF_EXITING)' and
> we don't need the extra task var either.

+1

> >>+   if (unlikely(segment_eq(get_fs(), KERNEL_DS)))
> >>+           return -EPERM;
> >>+   if (!access_ok(VERIFY_WRITE, unsafe_ptr, size))
> >>+           return -EPERM;
> >>+
> >>+   return probe_kernel_write(unsafe_ptr, src, size);
> >>+}
> >>+
> >>+static const struct bpf_func_proto bpf_probe_write_proto = {
> >>+   .func           = bpf_probe_write,
> >>+   .gpl_only       = true,
> >>+   .ret_type       = RET_INTEGER,
> >>+   .arg1_type      = ARG_ANYTHING,
> >>+   .arg2_type      = ARG_PTR_TO_STACK,
> >>+   .arg3_type      = ARG_CONST_STACK_SIZE,
> >
> >I have 2nd thoughts on naming.
> >I think 'consistency' with probe_read is actually hurting here.
> >People derive semantic of the helper mainly from the name.
> >If we call it bpf_probe_read, it would mean that it's generic
> >writing function, whereas bpf_copy_to_user has clear meaning
> >that it's writing to user memory only.
> >In other words bpf_probe_read and bpf_copy_to_user _are_ different
> >functions with purpose that is easily seen in the name,
> >whereas bpf_probe_read and bpf_probe_write look like a pair,
> >but they're not. probe_read can read kernel and user memory,
> >whereas probe_write only user.
> 
> Okay, but still I think that bpf_copy_to_user is a bit of a weird name
> for that helper, I associate copy_to_user mostly with data buffers or
> structures passed around with syscalls, but not necessarily with something
> else, meaning it's not quite obvious to me deriving this from the name
> to what this helper can achieve.

vhost is copying vring descriptors to user space with it,
In that path there are no syscalls in sight ;)

> How about "bpf_probe_write_user"? It still keeps basic semantics of
> bpf_probe_read, but for the writing part and "user" makes it pretty
> clear that it's not for kernel memory, plus people familiar with
> bpf_probe_read already will make sense on what bpf_probe_write_user
> is about much faster.

I like it. Let's do bpf_probe_write_user then.

Reply via email to