Re: [iovisor-dev] Extracting data from tracepoints (and anything else)

Andrii Nakryiko Mon, 23 Mar 2020 11:32:11 -0700

On Mon, Mar 23, 2020 at 9:38 AM <[email protected]> wrote:
>
> I've been exploring the libbpf library for different versions of the Linux 
> kernel, and trying to rewrite some of the BCC tools. I would like to do more 
> work with CO-RE eventually, but I'm trying to understand the entire model of 
> how BPF programs work and how data flows between the kernel, the VM, and 
> userspace. I just started using perf buffers instead of bpf_trace_printk and 
> came across an issue that has me scratching my head. In the below code, I'm 
> not able to access the const char * arg in the tracepoint sys_enter_openat 
> (kernel 4.15). For some reason the verifier rejects this code. I think it's 
> valid C (although I'm a little bit rusty still) and I think I followed the 
> correct flow where data must be copied from the kernel to the VM before being 
> able to use.
>
> If anyone has insight to share, I'd much appreciate it. Conversely, if anyone 
> can point me in the direction of how to debug BPF programs that would be 
> extremely helpful too. Should I just dig into learning the basics of BPF asm?
>
> Highlights of the code:
>
> struct bpf_map_def SEC("maps") events = {
>   .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>   .key_size = sizeof(int),
>   .value_size = sizeof(u32),
>   .max_entries = MAX_CPUS,
> };


nit: this is a legacy syntax of specifying BPF maps, please see [0]
for some newer examples

  [0] https://github.com/iovisor/bcc/tree/master/libbpf-tools

>
> struct sys_enter_openat_args {
>         u16 common_type;
>         u8 common_flags;
>         u8 common_preempt_count;
>         int common_pid;
>         int __syscall_nr;
>         int dfd;
>         char *filename;
>         int flags;
>         __mode_t mode;
> };
>
> SEC("tracepoint/syscalls/sys_enter_openat")
> int bpf_prog(struct sys_enter_openat_args *ctx) {
>   struct data_t data;
>   struct sys_enter_openat_args *args;
>
>   int res = bpf_probe_read(args, sizeof(ctx), ctx);

you don't need to bpf_probe_read() ctx here, you can just access its
members directly.

>   if(!res) {
>          data.file = "couldn't get file";
>   } else {
>          data.file = args->filename;

But here if you want to read filename contents itself, you'll need to
use bpf_probe_read_str().

Having data_t definition would be also helpful.

>   }
>
> Error Message:
>
> bpf_load_program() err=13
> 0: (bf) r6 = r1
> 1: (b7) r2 = 8
> 2: (bf) r3 = r6
> 3: (85) call bpf_probe_read#4
> R1 type=ctx expected=fp

this error from verifier is quite misleading, but what verifier
complains about here is that you try to read uninitialized pointer
(arg) and pass it as a first parameter into bpf_probe_read(). But see
above, you don't need to bpf_probe_read() anything, and even if you
wanted to it would have to be done very differently:

struct sys_enter_openat_args args; /* notice no pointer here */
bpf_probe_read(&args, sizeof(args), ctx); /* taking address of args,
taking size of args, not its pointer */

> The kernel didn't load the BPF program
>
>   data.pid = bpf_get_current_pid_tgid(); // use fn from libbpf.h to get 
> pid_tgid
>   bpf_get_current_comm(data.program_name, sizeof(data.program_name)); // puts 
> current comm into char array
>
>   bpf_perf_event_output(ctx, &events, 0, &data, sizeof(data));
>
>   return 0;
> }
>
> If more code would be helpful, I'm happy to share.
>
> I recognize that libbpf and CO-RE in later kernels provides an easier API for 
> dealing with char * (bpf_probe_read_str() I believe) but I'm trying to 
> understand what needs to be done to target different kernels and not just the 
> most cutting edge.
>
> As a second question, how much should I learn about perf(1) and its overlap 
> with BPF?
>
> Finally, for long-term monitoring solutions and passing readable data, do 
> most programs rely on pinning maps to the vfs instead of using perf buffers 
> or passing directly to a userspace process?

It's a mix. If your data should/can be pre-aggregated in kernel, using
map might benefit you in that you will be sending much less data to
user-space. But if you want to send every piece of information than
perf_buffer is faster and more convenient than having user-space query
BPF maps all the time.

>
> Thanks for the patience and goodwill with a new systems dev. I've enjoyed my 
> interactions with the BPF community.

You're welcome. Check libbpf-tools in BCC repo, it should give you
some examples to work off of.

>
> Tristan
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1829): https://lists.iovisor.org/g/iovisor-dev/message/1829
Mute This Topic: https://lists.iovisor.org/mt/72496365/21656
Group Owner: [email protected]
Unsubscribe: https://lists.iovisor.org/g/iovisor-dev/unsub  
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [iovisor-dev] Extracting data from tracepoints (and anything else)

Reply via email to