Re: [iovisor-dev] Extracting data from tracepoints (and anything else)

Andrii Nakryiko Sun, 05 Apr 2020 19:45:26 -0700

On Wed, Apr 1, 2020 at 12:52 PM <[email protected]> wrote:
>
> I've spent a few days trying to solve this issue I've had, and I've learned a 
> lot about both the past BPF APIs, and the new CO-RE API. I do have a couple 
> questions though.
>
> Once a CO-RE program is compiled and tested with the verifier, can it be run 
> on a kernel of the same version that isn't compiled with BTF?


Just answered on another Github issue
(https://github.com/iovisor/bcc/issues/2855#issuecomment-609532793),
please check it there as well. Short answer: no. Unless you can pretty
much guarantee that it will be exactly the same **binary** compiled
version of the kernel (not just same version).

> The CO-RE API is very nice, but in case that ends up only being able to run 
> on kernels with BTF support enabled, I've been trying to solve the original 
> issue found in this topic without the CO-RE approach. I'm still not able to 
> read the arguments from a given tracepoint. I'll put my code below. I'm sure 
> there are still plenty of issues and appreciate any time given to nudge me in 
> the right direction.
>
> #include <linux/bpf.h>
> #include "bpf_helpers.h"
>
> // To get kernel datatypes. Haven't figured out how to do this
> // without cloning the kernel source tree yet.
> #include "/kernel-src/tools/include/linux/types.h"

These should come from kernel-devel packages.

> #include <linux/version.h>
> #include <asm/ptrace.h>
> #include <unistd.h>
> #define MAX_CPUS 4
>
> struct bpf_map_def SEC("maps") events = {
>   .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>   .key_size = sizeof(int),
>   .value_size = sizeof(u32),
>   .max_entries = MAX_CPUS,
> };

nit: this is deprecated form of declaring maps, please see kernel
selftests for better examples.


>
> // Struct to pass data via perf buffer
> struct data_t {
>     u32 pid;
>     u32 tgid;
>     char program_name[16]; // max comm length is arbitrary

It's not arbitrary, it's set at 16 in kernel.

>     char file[255];
> };
>
> struct sys_enter_openat_args {
>     // struct fields obtained from tplist.py output
>     long long pad;
>     int __syscall_nr;
>     int dfd;
>     const char * filename;
>     int flags;
>     __mode_t mode;  // used __mode_t instead of umode_t
> };

I haven't checked the order of fields, but each field has to be long
in size (so 8 bytes on 64-bit arch). BPF is 64-bit arch, so long is
64-bit there. I'm not sure how this plays out on 32-bit target
architecture, but assuming you are on x86-64, all switch int to long
and make __mode_t also long.

>
> SEC("tracepoint/syscalls/sys_enter_openat")
> int bpf_prog(struct sys_enter_openat_args *ctx)
> {
>   struct data_t data = {};
>
>   data.pid = bpf_get_current_pid_tgid() >> 32;
>   data.tgid = bpf_get_current_pid_tgid();
>   bpf_get_current_comm(&data.program_name, sizeof(data.program_name));
>
>   int err = bpf_probe_read_str(data.file, sizeof(data.file), ctx->filename);
>
>   // debugging
>   char msg[] = "Probe read results: %d\n";
>   bpf_trace_printk(msg, sizeof(msg), ctx->err);

ctx->err doesn't exist according to definition above?

>
>   bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));

0 is not right here, use BPF_F_CURRENT_CPU (0xffffffffULL). Otherwise
you'll get data only on CPU #0 (if you get tracepoint triggered on
that CPU).

>
>   return 0;
> }
> char _license[] SEC("license") = "GPL";
> u32 _version SEC("version") = LINUX_VERSION_CODE;

_version is not necessary with modern libbpf and kernel.

>
> With the above code, err = -14 and ctx->filename = -100.

This is due to invalid memory layour of struct sys_enter_openat_args,
you are reading wrong pointer. But sometimes filename might not be in
memory and you will get -EFAULT (-14), but that should not happen all
the time for sure.


> I took a look at an article written by Gianluca Borello 
> (https://sysdig.com/blog/the-art-of-writing-ebpf-programs-a-primer/) for 
> Sysdig's approach, and thought that using a raw tracepoint would be easier to 
> get the filename arg than the above approach. I tried it out, but couldn't 
> get it to compile.
> Here's the new function:
>
> SEC("raw_tracepoint/sys_enter")
> int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
> {
>   unsigned long syscall_id = ctx->args[1];
>   volatile struct pt_regs *regs;
>   volatile const char *pathname;
>
>   regs = (struct pt_regs *)ctx->args[0];
>   pathname = (const char *)regs->si;

better include bpf_tracing.h header from libbpf and use
PT_REGS_PARM2_CORE(regs) instead of directly referencing fields of
pt_regs.

>
>   struct data_t data = {};
>
>   data.pid = bpf_get_current_pid_tgid() >> 32;
>   data.tgid = bpf_get_current_pid_tgid();
>   bpf_get_current_comm(&data.program_name, sizeof(data.program_name));
>
>   char msg[] = "Probe read results: %d\n";
>   bpf_trace_printk(msg, sizeof(msg), syscall_id);
>
>   bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
>
>   return 0;
> }
>
> With this code I get a compilation error:
> file_open_kern.c:77:34: error: no member named 'si' in 'struct pt_regs'
>   pathname = (const char *)regs->si;
>                                             ~~~~  ^
> This error is strange to me because ptrace.h does list %si as a valid field. 
> Perhaps I'm using the wrong header. Hopefully this is enough information to 
> be clear.

This is due to different definitions of struct pt_regs in user-space
and kernel-space. Using libbpf's bpf_tracing.h header and PT_REGS
macros should eliminate a lot of those. Sticking to vmlinux.h also
helps, but requires BPF CO-RE.

> If CO-RE compiled programs can run on non-BTF supported kernels, then I would 
> be more than happy to shift to that approach. Otherwise, it's nice to have 
> non-BTF reliant code.
>

No, unfortunately, it can't.

> As a final note, I was working through some examples for XDP in 
> https://github.com/xdp-project/xdp-tutorial and was thinking that something 
> similar would be helpful for general BPF programming. The API may be too 
> volatile at this point, but if people who have the technical expertise are 
> interested, I'm willing to donate some of my own time to help build something 
> similar. BCC's libbpf-tools has been extremely helpful, but it seems that 
> there's not any resources (I've found) that are as in-depth and cohesive as 
> the tutorial linked above. Again, I don't know if it's completely appropriate 
> at this stage of development, but I know there's a lot of interest out there 
> in using BPF at a more granular level and with less overhead than what is 
> offered with BCC.

I agree that such tutorial is sorely missing. libbpf-tools and kernel
selftests (not so much samples/bpf, though) are probably the best way
to see usage of all the newer features. It would be awesome for
someone to prepare an approachable and comprehensive set of tutorials,
of course. Please do give it a try and community will certainly help
you with answering questions you have!

> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1837): https://lists.iovisor.org/g/iovisor-dev/message/1837
Mute This Topic: https://lists.iovisor.org/mt/72496365/21656
Group Owner: [email protected]
Unsubscribe: https://lists.iovisor.org/g/iovisor-dev/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [iovisor-dev] Extracting data from tracepoints (and anything else)

Reply via email to