When a caller already guards a tracepoint with an explicit enabled check:
if (trace_foo_enabled() && cond)
trace_foo(args);
trace_foo() internally re-evaluates the static_branch_unlikely() key.
Since static branches are patched binary instructions the compiler cannot
fold the two evaluations, so every such site pays the cost twice.
This series introduces trace_invoke_##name() as a companion to
trace_##name(). It calls __do_trace_##name() directly, bypassing the
redundant static-branch re-check, while preserving all other correctness
properties of the normal path (RCU-watching assertion, might_fault() for
syscall tracepoints). The internal __do_trace_##name() symbol is not
leaked to call sites; trace_invoke_##name() is the only new public API.
if (trace_foo_enabled() && cond)
trace_invoke_foo(args); /* calls __do_trace_foo() directly */
The first patch adds the three-location change to
include/linux/tracepoint.h (__DECLARE_TRACE, __DECLARE_TRACE_SYSCALL,
and the !TRACEPOINTS_ENABLED stub). The remaining 14 patches
mechanically convert all guarded call sites found in the tree:
kernel/, io_uring/, net/, accel/habanalabs, cpufreq/, devfreq/,
dma-buf/, fsi/, drm/, HID, i2c/, spi/, scsi/ufs/, and btrfs/.
This series is motivated by Peter Zijlstra's observation in the discussion
around Dmitry Ilvokhin's locking tracepoint instrumentation series, where
he noted that compilers cannot optimize static branches and that guarded
call sites end up evaluating the static branch twice for no reason, and
by Steven Rostedt's suggestion to add a proper API instead of exposing
internal implementation details like __do_trace_##name() directly to
call sites:
https://lore.kernel.org/linux-trace-kernel/8298e098d3418cb446ef396f119edac58a3414e9.1772642407.gi...@ilvokhin.com
Suggested-by: Steven Rostedt <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Vineeth Pillai (Google) (15):
tracepoint: Add trace_invoke_##name() API
kernel: Use trace_invoke_##name() at guarded tracepoint call sites
io_uring: Use trace_invoke_##name() at guarded tracepoint call sites
net: Use trace_invoke_##name() at guarded tracepoint call sites
accel/habanalabs: Use trace_invoke_##name() at guarded tracepoint call
sites
cpufreq: Use trace_invoke_##name() at guarded tracepoint call sites
devfreq: Use trace_invoke_##name() at guarded tracepoint call sites
dma-buf: Use trace_invoke_##name() at guarded tracepoint call sites
fsi: Use trace_invoke_##name() at guarded tracepoint call sites
drm: Use trace_invoke_##name() at guarded tracepoint call sites
HID: Use trace_invoke_##name() at guarded tracepoint call sites
i2c: Use trace_invoke_##name() at guarded tracepoint call sites
spi: Use trace_invoke_##name() at guarded tracepoint call sites
scsi: ufs: Use trace_invoke_##name() at guarded tracepoint call sites
btrfs: Use trace_invoke_##name() at guarded tracepoint call sites
drivers/accel/habanalabs/common/device.c | 12 ++++++------
drivers/accel/habanalabs/common/mmu/mmu.c | 3 ++-
drivers/accel/habanalabs/common/pci/pci.c | 4 ++--
drivers/cpufreq/amd-pstate.c | 10 +++++-----
drivers/cpufreq/cpufreq.c | 2 +-
drivers/cpufreq/intel_pstate.c | 2 +-
drivers/devfreq/devfreq.c | 2 +-
drivers/dma-buf/dma-fence.c | 4 ++--
drivers/fsi/fsi-master-aspeed.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
drivers/gpu/drm/scheduler/sched_entity.c | 4 ++--
drivers/hid/intel-ish-hid/ipc/pci-ish.c | 2 +-
drivers/i2c/i2c-core-slave.c | 2 +-
drivers/spi/spi-axi-spi-engine.c | 4 ++--
drivers/ufs/core/ufshcd.c | 12 ++++++------
fs/btrfs/extent_map.c | 4 ++--
fs/btrfs/raid56.c | 4 ++--
include/linux/tracepoint.h | 11 +++++++++++
io_uring/io_uring.h | 2 +-
kernel/irq_work.c | 2 +-
kernel/sched/ext.c | 2 +-
kernel/smp.c | 2 +-
net/core/dev.c | 2 +-
net/core/xdp.c | 2 +-
net/openvswitch/actions.c | 2 +-
net/openvswitch/datapath.c | 2 +-
net/sctp/outqueue.c | 2 +-
net/tipc/node.c | 2 +-
30 files changed, 62 insertions(+), 50 deletions(-)
--
2.53.0