On Thu, Jul 21, 2016 at 06:09:17PM -0700, Sargun Dhillon wrote: > This allows user memory to be written to during the course of a kprobe. > It shouldn't be used to implement any kind of security mechanism > because of TOC-TOU attacks, but rather to debug, divert, and > manipulate execution of semi-cooperative processes. > > Although it uses probe_kernel_write, we limit the address space > the probe can write into by checking the space with access_ok. > This is so the call doesn't sleep. > > Given this feature is experimental, and has the risk of crashing > the system, we print a warning on invocation. > > It was tested with the tracex7 program on x86-64. > > Signed-off-by: Sargun Dhillon <sar...@sargun.me> > Cc: Alexei Starovoitov <a...@kernel.org> > Cc: Daniel Borkmann <dan...@iogearbox.net> > --- > include/uapi/linux/bpf.h | 12 ++++++++++++ > kernel/bpf/verifier.c | 9 +++++++++ > kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > samples/bpf/bpf_helpers.h | 2 ++ > 4 files changed, 60 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 2b7076f..4536282 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -365,6 +365,18 @@ enum bpf_func_id { > */ > BPF_FUNC_get_current_task, > > + /** > + * bpf_probe_write(void *dst, void *src, int len) > + * safely attempt to write to a location > + * @dst: destination address in userspace > + * @src: source address on stack > + * @len: number of bytes to copy > + * Return: > + * Returns number of bytes that could not be copied. > + * On success, this will be zero
that is odd comment. there are only three possible return values 0, -EFAULT, -EPERM > + */ > + BPF_FUNC_probe_write, > + > __BPF_FUNC_MAX_ID, > }; > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index f72f23b..6785008 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -1154,6 +1154,15 @@ static int check_call(struct verifier_env *env, int > func_id) > return -EINVAL; > } > > + if (func_id == BPF_FUNC_probe_write) { > + > pr_warn_once("************************************************\n"); > + pr_warn_once("* bpf_probe_write: Experimental Feature in use > *\n"); > + pr_warn_once("* bpf_probe_write: Feature may corrupt memory > *\n"); > + > pr_warn_once("************************************************\n"); > + pr_notice_ratelimited("bpf_probe_write in use by: %.16s-%d", > + current->comm, task_pid_nr(current)); > + } I think single line pr_notice_ratelimited() with 'feature may corrupt user memory' will be enough. Also please move this to tracing specific part into bpf_trace.c similar to bpf_get_trace_printk_proto() instead of verifier.c > +static u64 bpf_probe_write(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) > +{ > + void *unsafe_ptr = (void *) (long) r1; > + void *src = (void *) (long) r2; > + int size = (int) r3; > + struct task_struct *task = current; > + > + /* bpf_trace.c follows non-net comment style, so it's good here. just distracting vs the rest of net style. > + * Ensure we're in a user context which it is safe for the helper > + * to run. This helper has no business in a kthread > + * > + * access_ok should prevent writing to non-user memory, but on > + * some architectures (nommu, etc...) access_ok isn't enough > + * So we check the current segment > + */ > + > + if (unlikely(in_interrupt() || (task->flags & PF_KTHREAD))) > + return -EPERM; Should we also add a check for !PF_EXITING ? Like signals are not delivered to such tasks and I'm not sure what would be the state of mm of it. > + if (unlikely(segment_eq(get_fs(), KERNEL_DS))) > + return -EPERM; > + if (!access_ok(VERIFY_WRITE, unsafe_ptr, size)) > + return -EPERM; > + > + return probe_kernel_write(unsafe_ptr, src, size); > +} > + > +static const struct bpf_func_proto bpf_probe_write_proto = { > + .func = bpf_probe_write, > + .gpl_only = true, > + .ret_type = RET_INTEGER, > + .arg1_type = ARG_ANYTHING, > + .arg2_type = ARG_PTR_TO_STACK, > + .arg3_type = ARG_CONST_STACK_SIZE, I have 2nd thoughts on naming. I think 'consistency' with probe_read is actually hurting here. People derive semantic of the helper mainly from the name. If we call it bpf_probe_read, it would mean that it's generic writing function, whereas bpf_copy_to_user has clear meaning that it's writing to user memory only. In other words bpf_probe_read and bpf_copy_to_user _are_ different functions with purpose that is easily seen in the name, whereas bpf_probe_read and bpf_probe_write look like a pair, but they're not. probe_read can read kernel and user memory, whereas probe_write only user. So bpf_copy_to_user is a more suitable name.