Thanks! This is very useful for my experiments.
My use case pertains to safely altering execution flow in the kernel at
runtime based on kprobes from user space.
This could have many useful applications. For example, accelerating file
system calls
(conditionally returning to user space and not continuing execution after
the probed instruction in the kernel):
*Original kernel function:*
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t,
mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(AT_FDCWD, filename, flags, mode);
}
*eBPF Kprobe*
int kprobe__sys_open(struct pt_regs *ctx)
{
char buf[256];
char *fname = (char *) PT_REGS_PARM1(ctx);
int flags = PT_REGS_PARM2(ctx);
// only applied to file system entries under '/mnt'
bpf_probe_read(buf, sizeof(buf), fname);
if (buf[0] != '/' || buf[1] != 'm' || buf[2] != 'n' || buf[3] != 't' ||
buf[3] != '/' ) {
return 0;
}
// check access
if (!check_access_perm(fname, flags)) {
// *TODO*: this should return to user space and not continue
executing the syscall after probed instruction
ctx->ax = -EPERM; // return to userspace
}
// continue syscall execution
return 0;
}
I understand, a part of this functionality (conditional syscall
termination) has to be enabled in the kprobes infrastructure, but I was
wondering if anybody has tried this? and if this is at all feasible?
Thanks for your help!
Riya
On Mon, May 8, 2017 at 2:55 PM, Gianluca Borello via iovisor-dev <
[email protected]> wrote:
> On Sun, May 7, 2017 at 10:20 PM, riya khanna via iovisor-dev
> <[email protected]> wrote:
> > Hi,
> >
> > Is is possible to use kprobe from ebpf and modify syscall arguments or
> > return value?
> >
>
> Hi,
>
> A while ago, I experimented with something similar, and found no
> satisfactory solution at that time, which is probably ok since a
> feature like this could make BPF instrumentation become potentially
> very unsafe. That being said, I will share what I found at that time,
> but would certainly appreciate some more authoritative response :-)
>
> Using the bpf_probe_write_user() helper, you can overwrite userspace
> memory. This means that if a system call argument is passed by
> reference, you can try to mangle it at runtime by modifying the memory
> the argument points to in userspace. I wouldn't do any of this in
> production as it's very dirty and my tests were purely educational and
> limited to a debugging use case, but in this example you can see how I
> can quickly make all the open() calls to the file "foo1" go to "foo2"
> instead:
>
> b.attach_kprobe(event="sys_open", fn_name="trace_entry")
> ...
> int trace_entry(struct pt_regs *ctx)
> {
> char buf[10];
> char foo2[] = "foo2";
> char *fname = (char *) PT_REGS_PARM1(ctx);
>
> bpf_probe_read(buf, sizeof(buf), fname);
> if (buf[0] != 'f' || buf[1] != 'o' || buf[2] != 'o' || buf[3] != '1') {
> return 0;
> }
>
> bpf_probe_write_user(fname, foo2, sizeof(foo2));
>
> return 0;
> };
>
> It's just dirty test code, but it sort of achieves what you want:
>
> gianluca@sid:~$ cat foo1
> this is foo1 content
> gianluca@sid:~$ cat foo2
> this is foo2 content
> gianluca@sid:~$ sudo bcc_mangle_open.py &
> [1] 63453
> gianluca@sid:~$ cat foo1
> this is foo2 content
>
> At the moment, bpf_probe_write_user() is not allowed to modify kernel
> memory, and the original registers (were the system call arguments are
> passed) are saved on the kernel stack before the kprobe handler
> invocation, so I didn't find a way to change arguments passed by
> value, even if they are accessible to read through the ctx pointer.
>
> In theory, if bpf_probe_write_user() allowed at least write access to
> part of the struct pt_regs saved on the kernel stack where the
> arguments are, it would be possible to change all arguments.
>
> Same discussion goes for return values, I actually tried (again, just
> dirty test code for my self education) a simple kernel change
> (https://github.com/gianlucaborello/linux/commit/
> d1dd6bef91b408a76d4b458211dbc6a86476c9c6)
> to allow BPF programs attached to a kretprobe to change part of the
> saved registers structure where the return code is usually stored,
> this way you can also alter the return code of generic functions
> running in the kernel (haven't tried with syscalls specifically, but
> you get the idea).
>
> While I was writing this, I was actually thinking if you could have
> better luck by playing with bpf_probe_write_user() while intercepting
> sys calls invocation in userspace with uprobes instead of kprobes, but
> I've never tried it so I don't want to risk saying something stupid.
>
> Once again, this is all just experimental and the fact that you get
> significant warnings in the dmesg whenever you use
> bpf_probe_write_user() is a sign that this should be limited to
> debugging purposes of non-production systems :-)
>
> Hope this is of any help, and I am as curious as you to hear other
> opinions.
>
> Thanks
> _______________________________________________
> iovisor-dev mailing list
> [email protected]
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
>
_______________________________________________
iovisor-dev mailing list
[email protected]
https://lists.iovisor.org/mailman/listinfo/iovisor-dev