On Tue, Dec 12, 2023 at 5:29 AM Håkon Bugge <[email protected]> wrote: > > For the most time-consuming function, when running a syscall benchmark > with STIG compliant audit rules: > > Overhead Command Shared Object Symbol > ......... ............ ................. ........................ > > 27.62% syscall_lat [kernel.kallsyms] [k] __audit_filter_op > > we apply codegen optimizations, which speeds up the syscall > performance by around 17% on an Intel Cascade Lake system. > > We run "perf stat -d -r 5 ./syscall_lat", where syscall_lat is a C > application that measures average syscall latency from getpid() > running 100 million rounds. > > Between each perf run, we reboot the system and waits until the last > minute load is less than 1.0. > > We boot the kernel, v6.6-rc4, with "mitigations=off", in order to > amplify the changes in the audit system. > > Let the base kernel be v6.6-rc4 with booted with "audit=1" and > "mitigations=off" and with the commit "audit: Vary struct audit_entry > alignment" on an Intel Cascade Lake system. The following three > metrics are reported, nanoseconds per syscall, L1D misses per syscall, > and finally Intructions Per Cycle, ipc. > > Base vs. base + this commit gives: > > ns per call: > min avg max pstdev > - 203 203 209 0.954149 > + 173 173 178 0.884534 > > L1d misses per syscall: > min avg max pstdev > - 0.012 0.103 0.817 0.238352 > + 0.010 0.209 1.235 0.399416 > > ipc: > min avg max pstdev > - 2.320 2.329 2.330 0.003000 > + 2.430 2.436 2.440 0.004899 > > Signed-off-by: Håkon Bugge <[email protected]> > --- > kernel/auditsc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index 6f0d6fb6523fa..84d0dfe75a4ac 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -822,6 +822,7 @@ static int audit_in_mask(const struct audit_krule *rule, > unsigned long val) > * parameter can be NULL, but all others must be specified. > * Returns 1/true if the filter finds a match, 0/false if none are found. > */ > +#pragma GCC optimize("unswitch-loops", "align-loops=16", "align-jumps=16")
The kernel doesn't really make use of #pragma optimization statements like this, at least not in any of the core areas, and I'm not interested in being the first to do so. I appreciate the time and effort that you have spent profiling the audit subsystem, but this isn't a patch I can accept at this point in time, I'm sorry. > static int __audit_filter_op(struct task_struct *tsk, > struct audit_context *ctx, > struct list_head *list, > @@ -841,6 +842,7 @@ static int __audit_filter_op(struct task_struct *tsk, > } > return 0; > } > +#pragma GCC reset_options > > /** > * audit_filter_uring - apply filters to an io_uring operation > -- > 2.39.3 -- paul-moore.com
