On Mon, May 8, 2017 at 2:12 PM, riya khanna <[email protected]> wrote: > Thanks! This is very useful for my experiments. > > My use case pertains to safely altering execution flow in the kernel at > runtime based on kprobes from user space. >
I'll throw some more words in there, since your use case is now a little bit more detailed, although I am not an expert on this and will really be happy to see some more authoritative answers. I think what you are trying to do is something definitely interesting, but I don't think it's currently feasible, although it could be at some point and I am hopeful :-) In particular, the techniques that come to mind for terminating a system call early pretty much rely on leveraging the return value of `syscall_trace_enter()` [1], since that seems a clean place already used to do so. If we consider the current upstream as of 4.11, the only way I can see to do that from a BPF program is to use seccomp (which is called in that function), it's the only place where the return value of BPF_PROG_RUN is actually used to make a decision on the system call path (since, as I said earlier, bpf_probe_write_user() can't be used to alter the kernel state). Obviously, this is not quite the same as writing a full eBPF program. For this reason, I think ultimately having a way to alter a subportion of the kernel state (either by relaxing bpf_probe_write_user(), or relaxing the verifier in a less-ugly way than [2] or somehow interpreting the return value of a BPF kretprobe and interposing it with the original return code prior to restoring registers) would be pretty much the key enabler for such feature. If someone else is interested in this (even just to say that I'm proposing something very idiotic), I would definitely be thrilled to hear opinions and invest some time! On the other hand, if you are willing to relax some of your constraints, keep also in mind that if you are willing to filter at the LSM hook level and not at the system call level (which covers a lot of system calls anyway), Landlock [3] is looking very promising. I for one learned a lot from reading its code, and I think the approach is very elegant and most of the hooks are called before serious work in the system calls is done, so it could potentially enable your use case. It's still not upstream yet. I definitely don't know what the gurus think, but I think there would be some value in pursuing ideas such as this one, being able to partially mutate some of the kernel state right from a k(ret)probe could extend some of the great concepts that XDP introduced to the networking stack on the system call path (and potentially other subsystems) as well, although I fully recognize the analogy is a bit of a stretch. Thanks [1] https://github.com/torvalds/linux/blob/master/arch/x86/entry/common.c#L65 [2] https://github.com/gianlucaborello/linux/commit/d1dd6bef91b408a76d4b458211dbc6a86476c9c6 [3] https://lwn.net/Articles/715203/ _______________________________________________ iovisor-dev mailing list [email protected] https://lists.iovisor.org/mailman/listinfo/iovisor-dev
