On 5/17/18 8:32 AM, Daniel Borkmann wrote:
On 05/16/2018 11:59 PM, Yonghong Song wrote:
On 5/16/18 4:27 AM, Peter Zijlstra wrote:
On Tue, May 15, 2018 at 04:45:16PM -0700, Yonghong Song wrote:
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_PERF_EVENT_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobea perf event, BPF_PERF_EVENT_QUERY will return
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Signed-off-by: Yonghong Song <y...@fb.com>
include/linux/trace_events.h | 15 ++++++
include/uapi/linux/bpf.h | 25 ++++++++++
kernel/bpf/syscall.c | 113
kernel/trace/bpf_trace.c | 53 ++++++++++++++++++++
kernel/trace/trace_kprobe.c | 29 +++++++++++
kernel/trace/trace_uprobe.c | 22 +++++++++
6 files changed, 257 insertions(+)
Why is the command called *_PERF_EVENT_* ? Are there not a lot of !perf
places to attach BPF proglets?
Just gave a complete picture, the below are major places to attach
. perf based (through perf ioctl)
. raw tracepoint based (through bpf interface)
. netlink interface for tc, xdp, tunneling
. setsockopt for socket filters
. cgroup based (bpf attachment subcommand)
mostly networking and io devices
. some other networking socket related (sk_skb stream/parser/verdict,
sk_msg verdict) through bpf attachment subcommand.
Currently, for cgroup based attachment, we have BPF_PROG_QUERY with input
cgroup file descriptor. For other networking based queries, we
may need to enumerate tc filters, networking devices, open sockets, etc.
to get the attachment information.
So to have one BPF_QUERY command line may be too complex to
cover all cases.
But you are right that BPF_PERF_EVENT_QUERY name is too narrow since
it should be used for other (pid, fd) based queries as well (e.g., socket, or
other potential uses in the future).
How about the subcommand name BPF_TASK_FD_QUERY and make bpf_attr.task_fd_query
I like the introspection output it provides in 7/7, it's really great!
So the query interface would only ever be tied to BPF progs whose attach
life time is tied to the life time of the application and as soon as all
refs on the fd are released it's unloaded from the system. BPF_TASK_FD_QUERY
seems okay to me, or something like BPF_ATTACH_QUERY. Even if the name is
slightly more generic, it might be more fitting with other cmds like
BPF_PROG_QUERY we have where we tell an attach point to retrieve all progs
from it (though only tied to cgroups right now, it may not be in future).
I think BPF_TASK_FD_QUERY is okay. Using BPF_ATTACH_QUERY indeed seems
a little bit broader to me as other query subcommands are possible to
query attachments with different input.
BPF_PROG_QUERY is also trying to query attachment. Currently, given a
cgroup fd, it will query prog array attached. Sean has the patch to
attach bpf programs to a RC device, and given a device fd, it will
query prog array attached to that device.
For all the others that are not strictly tied to the task but global, bpftool
would then need to be extended to query the various other interfaces like
netlink for retrieval which is on todo for some point in future as well. So
this set nicely complements this introspection aspect.