On Mon, May 8, 2017 at 2:12 PM, riya khanna <[email protected]> wrote:
> Thanks! This is very useful for my experiments.
>
> My use case pertains to safely altering execution flow in the kernel at
> runtime based on kprobes from user space.
>

I'll throw some more words in there, since your use case is now a
little bit more detailed, although I am not an expert on this and will
really be happy to see some more authoritative answers.

I think what you are trying to do is something definitely interesting,
but I don't think it's currently feasible, although it could be at
some point and I am hopeful :-)

In particular, the techniques that come to mind for terminating a
system call early pretty much rely on leveraging the return value of
`syscall_trace_enter()` [1], since that seems a clean place already
used to do so. If we consider the current upstream as of 4.11, the
only way I can see to do that from a BPF program is to use seccomp
(which is called in that function), it's the only place where the
return value of BPF_PROG_RUN is actually used to make a decision on
the system call path (since, as I said earlier, bpf_probe_write_user()
can't be used to alter the kernel state). Obviously, this is not quite
the same as writing a full eBPF program.

For this reason, I think ultimately having a way to alter a subportion
of the kernel state (either by relaxing bpf_probe_write_user(), or
relaxing the verifier in a less-ugly way than [2] or somehow
interpreting the return value of a BPF kretprobe and interposing it
with the original return code prior to restoring registers) would be
pretty much the key enabler for such feature. If someone else is
interested in this (even just to say that I'm proposing something very
idiotic), I would definitely be thrilled to hear opinions and invest
some time!

On the other hand, if you are willing to relax some of your
constraints, keep also in mind that if you are willing to filter at
the LSM hook level and not at the system call level (which covers a
lot of system calls anyway), Landlock [3] is looking very promising. I
for one learned a lot from reading its code, and I think the approach
is very elegant and most of the hooks are called before serious work
in the system calls is done, so it could potentially enable your use
case. It's still not upstream yet.

I definitely don't know what the gurus think, but I think there would
be some value in pursuing ideas such as this one, being able to
partially mutate some of the kernel state right from a k(ret)probe
could extend some of the great concepts that XDP introduced to the
networking stack on the system call path (and potentially other
subsystems) as well, although I fully recognize the analogy is a bit
of a stretch.

Thanks

[1] https://github.com/torvalds/linux/blob/master/arch/x86/entry/common.c#L65
[2] 
https://github.com/gianlucaborello/linux/commit/d1dd6bef91b408a76d4b458211dbc6a86476c9c6
[3] https://lwn.net/Articles/715203/
_______________________________________________
iovisor-dev mailing list
[email protected]
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

Reply via email to