On Fri, 23 Aug 2024 21:53:00 +0800
Tianyi Liu <[email protected]> wrote:
> U(ret)probes are designed to be filterable using the PID, which is the
> second parameter in the perf_event_open syscall. Currently, uprobe works
> well with the filtering, but uretprobe is not affected by it. This often
> leads to users being disturbed by events from uninterested processes while
> using uretprobe.
>
> We found that the filter function was not invoked when uretprobe was
> initially implemented, and this has been existing for ten years. We have
> tested the patch under our workload, binding eBPF programs to uretprobe
> tracepoints, and confirmed that it resolved our problem.
Is this eBPF related problem? It seems only perf record is also affected.
Let me try.
>
> Following are the steps to reproduce the issue:
>
> Step 1. Compile the following reproducer program:
> ```
>
> int main() {
> printf("pid: %d\n", getpid());
> while (1) {
> sleep(2);
> void *ptr = malloc(1024);
> free(ptr);
> }
> }
> ```
> We will then use uretprobe to trace the `malloc` function.
OK, and run perf probe to add an event on malloc's return.
$ sudo ~/bin/perf probe -x ./malloc-run --add malloc%return
Added new event:
probe_malloc:malloc__return (on malloc%return in
/home/mhiramat/ksrc/linux/malloc-run)
You can now use it in all perf tools, such as:
perf record -e probe_malloc:malloc__return -aR sleep 1
>
> Step 2. Run two instances of the reproducer program and record their PIDs.
$ ./malloc-run & ./malloc-run &
[1] 93927
[2] 93928
pid: 93927
pid: 93928
And trace one of them;
$ sudo ~/bin/perf trace record -e probe_malloc:malloc__return -p 93928
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.031 MB perf.data (9 samples) ]
And dump the data;
$ sudo ~/bin/perf script
malloc-run 93928 [004] 351736.730649: raw_syscalls:sys_exit: NR
230 = 0
malloc-run 93928 [004] 351736.730694: probe_malloc:malloc__return:
(561cfdeb30c0 <- 561cfdeb3204)
malloc-run 93928 [004] 351736.730696: raw_syscalls:sys_enter: NR
230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
malloc-run 93928 [004] 351738.730857: raw_syscalls:sys_exit: NR
230 = 0
malloc-run 93928 [004] 351738.730869: probe_malloc:malloc__return:
(561cfdeb30c0 <- 561cfdeb3204)
malloc-run 93928 [004] 351738.730883: raw_syscalls:sys_enter: NR
230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
malloc-run 93928 [004] 351740.731110: raw_syscalls:sys_exit: NR
230 = 0
malloc-run 93928 [004] 351740.731125: probe_malloc:malloc__return:
(561cfdeb30c0 <- 561cfdeb3204)
malloc-run 93928 [004] 351740.731127: raw_syscalls:sys_enter: NR
230 (0, 0, 7ffc7a5c5380, 7ffc7a5c5380, 561d2940f6b0,
Hmm, it seems to trace one pid data. (without this change)
If this changes eBPF behavior, I would like to involve eBPF people to ask
this is OK. As far as from the viewpoint of perf tool, current code works.
But I agree that current code is a bit strange. Oleg, do you know anything?
Thank you,
>
> Step 3. Use uretprobe to trace each of the two running reproducers
> separately. We use bpftrace to make it easier to reproduce. Please run two
> instances of bpftrace simultaneously: the first instance filters events
> from PID1, and the second instance filters events from PID2.
>
> The expected behavior is that each bpftrace instance would only print
> events matching its respective PID filter. However, in practice, both
> bpftrace instances receive events from both processes, the PID filter is
> ineffective at this moment:
>
> Before:
> ```
> PID1=55256
> bpftrace -p $PID1 -e 'uretprobe:libc:malloc { printf("time=%llu pid=%d\n",
> elapsed / 1000000000, pid); }'
> Attaching 1 probe...
> time=0 pid=55256
> time=2 pid=55273
> time=2 pid=55256
> time=4 pid=55273
> time=4 pid=55256
> time=6 pid=55273
> time=6 pid=55256
>
> PID2=55273
> bpftrace -p $PID2 -e 'uretprobe:libc:malloc { printf("time=%llu pid=%d\n",
> elapsed / 1000000000, pid); }'
> Attaching 1 probe...
> time=0 pid=55273
> time=0 pid=55256
> time=2 pid=55273
> time=2 pid=55256
> time=4 pid=55273
> time=4 pid=55256
> time=6 pid=55273
> time=6 pid=55256
> ```
>
> After: Both bpftrace instances will show the expected behavior, only
> printing events from the PID specified by their respective filters:
> ```
> PID1=1621
> bpftrace -p $PID1 -e 'uretprobe:libc:malloc { printf("time=%llu pid=%d\n",
> elapsed / 1000000000, pid); }'
> Attaching 1 probe...
> time=0 pid=1621
> time=2 pid=1621
> time=4 pid=1621
> time=6 pid=1621
>
> PID2=1633
> bpftrace -p $PID2 -e 'uretprobe:libc:malloc { printf("time=%llu pid=%d\n",
> elapsed / 1000000000, pid); }'
> Attaching 1 probe...
> time=0 pid=1633
> time=2 pid=1633
> time=4 pid=1633
> time=6 pid=1633
> ```
>
> Fixes: c1ae5c75e103 ("uprobes/tracing: Introduce is_ret_probe() and
> uretprobe_dispatcher()")
> Cc: Alban Crequy <[email protected]>
> Signed-off-by: Francis Laniel <[email protected]>
> Signed-off-by: Tianyi Liu <[email protected]>
> ---
> Changes in v2:
> - Drop cover letter and update commit message.
> - Link to v1:
> https://lore.kernel.org/linux-trace-kernel/me0p300mb04166144cdf92a72b9e1baea9d...@me0p300mb0416.ausp300.prod.outlook.com/
> ---
> kernel/trace/trace_uprobe.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index c98e3b3386ba..c7e2a0962928 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -1443,6 +1443,9 @@ static void uretprobe_perf_func(struct trace_uprobe
> *tu, unsigned long func,
> struct pt_regs *regs,
> struct uprobe_cpu_buffer **ucbp)
> {
> + if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> + return;
> +
> __uprobe_perf_func(tu, func, regs, ucbp);
> }
>
> --
> 2.34.1
>
--
Masami Hiramatsu (Google) <[email protected]>