On Fri, Oct 30, 2020 at 7:18 AM YiFei Zhu <zhuyifei1...@gmail.com> wrote: > I got a bare metal test machine with Intel(R) Xeon(R) CPU E5-2660 v3 @ > 2.60GHz, running Ubuntu 18.04. Test kernels are compiled at > 57a339117e52 ("selftests/seccomp: Compare bitmap vs filter overhead") > and 3650b228f83a ("Linux 5.10-rc1"), built with Ubuntu's > 5.3.0-64-generic's config, then `make olddefconfig`. "Mitigations off" > indicate the kernel was booted with "nospectre_v2 nospectre_v1 > no_stf_barrier tsx=off tsx_async_abort=off". > > The benchmark was single-job make on x86_64 defconfig of 5.9.1, with > CPU affinity to set only processor #0. Raw results are appended below. > Each boot is tested by running the build directly and inside docker, > with and without seccomp. The commands used are attached below. Each > test is 4 trials, with the middle two (non-minimum, non-maximum) wall > clock time averaged. Results summary: > > Mitigations On Mitigations Off > With Cache Without Cache With Cache Without Cache > Native 18:17.38 18:13.78 18:16.08 18:15.67 > D. no seccomp 18:15.54 18:17.71 18:17.58 18:16.75 > D. + seccomp 20:42.47 20:45.04 18:47.67 18:49.01 > > To be honest, I'm somewhat surprised that it didn't produce as much of > a dent in the seccomp overhead in this macro benchmark as I had > expected.
My peers pointed out that in my previous benchmark there are still a few mitigations left on, and suggested to use "noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off tsx=on tsx_async_abort=off mitigations=off". Results with "Mitigations Off" updated: Mitigations On Mitigations Off With Cache Without Cache With Cache Without Cache Native 18:17.38 18:13.78 17:43.42 17:47.68 D. no seccomp 18:15.54 18:17.71 17:34.59 17:37.54 D. + seccomp 20:42.47 20:45.04 17:35.70 17:37.16 Whether seccomp is on or off seems not to make much of a difference for this benchmark. Bitmap being enabled does seem to decrease the overall compilation time but it also affects where seccomp is off, so the speedup is probably from other factors. We are thinking about using more syscall-intensive workloads, such as httpd. Thugh, this does make me wonder, where does the 3-minute overhead with seccomp with mitigations come from? Is it data cache misses? If that is the case, can we somehow preload the seccomp bitmap cache maybe? I mean, mitigations only cause around half a minute slowdown without seccomp but seccomp somehow amplify the slowdown with an additional 2.5 minutes, so something must be off here. This is the raw output for the time commands: ==== with cache, mitigations off ==== 947.02user 108.62system 17:47.65elapsed 98%CPU (0avgtext+0avgdata 239804maxresident)k 25112inputs+217152outputs (166major+51934447minor)pagefaults 0swaps 947.91user 108.20system 17:46.53elapsed 99%CPU (0avgtext+0avgdata 239576maxresident)k 0inputs+217152outputs (0major+51941524minor)pagefaults 0swaps 948.33user 108.70system 17:47.72elapsed 98%CPU (0avgtext+0avgdata 239604maxresident)k 0inputs+217152outputs (0major+51938566minor)pagefaults 0swaps 948.65user 108.81system 17:48.41elapsed 98%CPU (0avgtext+0avgdata 239692maxresident)k 0inputs+217152outputs (0major+51935349minor)pagefaults 0swaps 932.12user 113.68system 17:37.24elapsed 98%CPU (0avgtext+0avgdata 239660maxresident)k 0inputs+217152outputs (0major+51547571minor)pagefaults 0swap 931.69user 114.12system 17:37.84elapsed 98%CPU (0avgtext+0avgdata 239448maxresident)k 0inputs+217152outputs (0major+51539964minor)pagefaults 0swaps 932.25user 113.39system 17:37.75elapsed 98%CPU (0avgtext+0avgdata 239372maxresident)k 0inputs+217152outputs (0major+51538018minor)pagefaults 0swaps 931.09user 114.25system 17:37.34elapsed 98%CPU (0avgtext+0avgdata 239508maxresident)k 0inputs+217152outputs (0major+51537700minor)pagefaults 0swaps 929.96user 113.42system 17:36.23elapsed 98%CPU (0avgtext+0avgdata 239448maxresident)k 984inputs+217152outputs (22major+51544059minor)pagefaults 0swaps 929.73user 115.13system 17:38.09elapsed 98%CPU (0avgtext+0avgdata 239464maxresident)k 0inputs+217152outputs (0major+51540259minor)pagefaults 0swaps 930.13user 112.71system 17:36.17elapsed 98%CPU (0avgtext+0avgdata 239620maxresident)k 0inputs+217152outputs (0major+51540623minor)pagefaults 0swaps 930.57user 113.02system 17:49.70elapsed 97%CPU (0avgtext+0avgdata 239432maxresident)k 0inputs+217152outputs (0major+51537776minor)pagefaults 0swaps ==== without cache, mitigations off ==== 947.59user 108.06system 17:44.56elapsed 99%CPU (0avgtext+0avgdata 239484maxresident)k 25112inputs+217152outputs (167major+51938723minor)pagefaults 0swaps 947.95user 108.58system 17:43.40elapsed 99%CPU (0avgtext+0avgdata 239580maxresident)k 0inputs+217152outputs (0major+51943434minor)pagefaults 0swaps 948.54user 106.62system 17:42.39elapsed 99%CPU (0avgtext+0avgdata 239608maxresident)k 0inputs+217152outputs (0major+51936408minor)pagefaults 0swaps 947.85user 107.92system 17:43.44elapsed 99%CPU (0avgtext+0avgdata 239656maxresident)k 0inputs+217152outputs (0major+51931633minor)pagefaults 0swaps 931.28user 111.16system 17:33.59elapsed 98%CPU (0avgtext+0avgdata 239440maxresident)k 0inputs+217152outputs (0major+51543540minor)pagefaults 0swaps 930.21user 112.56system 17:34.20elapsed 98%CPU (0avgtext+0avgdata 239400maxresident)k 0inputs+217152outputs (0major+51539699minor)pagefaults 0swaps 930.16user 113.74system 17:35.06elapsed 98%CPU (0avgtext+0avgdata 239344maxresident)k 0inputs+217152outputs (0major+51543072minor)pagefaults 0swaps 930.17user 112.77system 17:34.98elapsed 98%CPU (0avgtext+0avgdata 239176maxresident)k 0inputs+217152outputs (0major+51540777minor)pagefaults 0swaps 931.92user 113.31system 17:36.05elapsed 98%CPU (0avgtext+0avgdata 239520maxresident)k 984inputs+217152outputs (22major+51534636minor)pagefaults 0swaps 931.14user 112.81system 17:35.35elapsed 98%CPU (0avgtext+0avgdata 239524maxresident)k 0inputs+217152outputs (0major+51549007minor)pagefaults 0swaps 930.93user 114.56system 17:37.72elapsed 98%CPU (0avgtext+0avgdata 239360maxresident)k 0inputs+217152outputs (0major+51542191minor)pagefaults 0swaps 932.26user 111.54system 17:35.36elapsed 98%CPU (0avgtext+0avgdata 239572maxresident)k 0inputs+217152outputs (0major+51537921minor)pagefaults 0swaps YiFei Zhu