Frederik,

The issue you are facing on prohibitive uprobe performance (even for
sampling) has been discussed in this forum. The idea is to adopt maybe
similar mechanism like dyninst or kerninst. We (me, Brenden, and maybe
others) are starting to exploring in this space.

Yonghong

On Mon, Jul 17, 2017 at 4:37 PM, Frederik Deweerdt via iovisor-dev
<[email protected]> wrote:
> Hello,
>
> I'm trying to use uprobes in order to be able to keep track of how
> long it takes to execute a given function in a shared library used by
> tens of thousands of threads on a 32 core machine. Unfortunately, i'm
> seeing a 3x slowdown when setting the uprobes (even ones that only do
> a `return 0;` in the body).
>
> Looking at a perf record, it looks like lock contention is the
> culprit: queued_spin_lock_slowpath accounts for more than 20% of the
> workload.
>
> I'm wondering if there's a way around that, for my use case sampling
> would be an option, but I haven't found a way to do so without
> actually entering the probe (which has a prohibitive cost by itself).
>
> Any thoughts?
> Frederik
> _______________________________________________
> iovisor-dev mailing list
> [email protected]
> https://lists.iovisor.org/mailman/listinfo/iovisor-dev
_______________________________________________
iovisor-dev mailing list
[email protected]
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

Reply via email to