Hello, I'm trying to use uprobes in order to be able to keep track of how long it takes to execute a given function in a shared library used by tens of thousands of threads on a 32 core machine. Unfortunately, i'm seeing a 3x slowdown when setting the uprobes (even ones that only do a `return 0;` in the body).
Looking at a perf record, it looks like lock contention is the culprit: queued_spin_lock_slowpath accounts for more than 20% of the workload. I'm wondering if there's a way around that, for my use case sampling would be an option, but I haven't found a way to do so without actually entering the probe (which has a prohibitive cost by itself). Any thoughts? Frederik _______________________________________________ iovisor-dev mailing list [email protected] https://lists.iovisor.org/mailman/listinfo/iovisor-dev
