The whole discussion was very confusing (yes, I too contributed to the
confusion ;), let me try to summarise.

> U(ret)probes are designed to be filterable using the PID, which is the
> second parameter in the perf_event_open syscall. Currently, uprobe works
> well with the filtering, but uretprobe is not affected by it.

And this is correct. But the CONFIG_BPF_EVENTS code in __uprobe_perf_func()
misunderstands the purpose of uprobe_perf_filter().

Lets forget about BPF for the moment. It is not that uprobe_perf_filter()
does the filtering by the PID, it doesn't. We can simply kill this function
and perf will work correctly. The perf layer in __uprobe_perf_func() does
the filtering when perf_event->hw.target != NULL.

So why does uprobe_perf_filter() call uprobe_perf_filter()? Not to avoid
the __uprobe_perf_func() call (as the BPF code assumes), but to trigger
unapply_uprobe() in handler_chain().

Suppose you do, say,

        $ perf probe -x /path/to/libc some_hot_function
or
        $ perf probe -x /path/to/libc some_hot_function%return
then
        $perf record -e ... -p 1

to trace the usage of some_hot_function() in the init process. Everything
will work just fine if we kill uprobe_perf_filter()->uprobe_perf_filter().

But. If INIT forks a child C, dup_mm() will copy int3 installed by perf.
So the child C will hit this breakpoint and cal handle_swbp/etc for no
reason every time it calls some_hot_function(), not good.

That is why uprobe_perf_func() calls uprobe_perf_filter() which returns
UPROBE_HANDLER_REMOVE when C hits the breakpoint. handler_chain() will
call unapply_uprobe() which will remove this breakpoint from C->mm.

> We found that the filter function was not invoked when uretprobe was
> initially implemented, and this has been existing for ten years.

See above, this is correct.

Note also that if you only use perf-probe + perf-record, no matter how
many instances, you can even add BUG_ON(!uprobe_perf_filter(...)) into
uretprobe_perf_func(). IIRC, perf doesn't use create_local_trace_uprobe().

---------------------------------------------------------------------------
Now lets return to BPF and this particular problem. I won't really argue
with this patch, but

        - Please change the subject and update the changelog,
          "Fixes: c1ae5c75e103" and the whole reasoning is misleading
          and wrong, IMO.

        - This patch won't fix all problems because uprobe_perf_filter()
          filters by mm, not by pid. The changelog/patch assumes that it
          is a "PID filter", but it is not.

          See 
https://lore.kernel.org/linux-trace-kernel/[email protected]/
          If the traced process does clone(CLONE_VM), bpftrace will hit the
          similar problem, with uprobe or uretprobe.

        - So I still think that the "right" fix should change the
          bpf_prog_run_array_uprobe() paths somehow, but I know nothing
          about bpf.

Oleg.


Reply via email to