GCC 6+ supports segment qualifiers. Using them allows to implement several optimizations:
1. Avoid unnecessary instructions when an operation is carried on read/written per-cpu value, and instead allow the compiler to set instructions that access per-cpu value directly. 2. Make this_cpu_ptr() more efficient and allow its value to be cached, since preemption must be disabled when this_cpu_ptr() is used. 3. Provide better alternative for this_cpu_read_stable() that caches values more efficiently using alias attribute to const variable. 4. Allow the compiler to perform other optimizations (e.g. CSE). 5. Use rip-relative addressing in per_cpu_read_stable(), which make it PIE-ready. "size" and Peter's compare do not seem to show the impact on code size reduction correctly. Summing the code size according to nm on defconfig shows a minor reduction from 11451310 to 11451310 (0.09%). RFC->v1: * Fixing i386 build bug * Moving chunk to the right place [Peter] Nadav Amit (7): compiler: Report x86 segment support x86/percpu: Use compiler segment prefix qualifier x86/percpu: Use C for percpu accesses when possible x86: Fix possible caching of current_task percpu: Assume preemption is disabled on per_cpu_ptr() x86/percpu: Optimized arch_raw_cpu_ptr() x86/current: Aggressive caching of current arch/x86/include/asm/current.h | 30 +++ arch/x86/include/asm/fpu/internal.h | 7 +- arch/x86/include/asm/percpu.h | 293 +++++++++++++++++++------ arch/x86/include/asm/preempt.h | 3 +- arch/x86/include/asm/resctrl_sched.h | 14 +- arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/common.c | 7 +- arch/x86/kernel/cpu/current.c | 16 ++ arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 +- arch/x86/kernel/process_32.c | 4 +- arch/x86/kernel/process_64.c | 4 +- include/asm-generic/percpu.h | 12 + include/linux/compiler-gcc.h | 4 + include/linux/compiler.h | 2 +- include/linux/percpu-defs.h | 33 ++- 15 files changed, 346 insertions(+), 88 deletions(-) create mode 100644 arch/x86/kernel/cpu/current.c -- 2.17.1