Hello,

I'm trying to use uprobes in order to be able to keep track of how
long it takes to execute a given function in a shared library used by
tens of thousands of threads on a 32 core machine. Unfortunately, i'm
seeing a 3x slowdown when setting the uprobes (even ones that only do
a `return 0;` in the body).

Looking at a perf record, it looks like lock contention is the
culprit: queued_spin_lock_slowpath accounts for more than 20% of the
workload.

I'm wondering if there's a way around that, for my use case sampling
would be an option, but I haven't found a way to do so without
actually entering the probe (which has a prohibitive cost by itself).

Any thoughts?
Frederik
_______________________________________________
iovisor-dev mailing list
[email protected]
https://lists.iovisor.org/mailman/listinfo/iovisor-dev

Reply via email to