On Wed, Oct 10, 2018 at 05:33:36PM -0700, Eric Dumazet wrote: > While looking at native_sched_clock() disassembly I had > the surprise to see the compiler (gcc 7.3 here) had > optimized out the loop, meaning the code is broken. > > Using the documented and approved API not only fixes the bug, > it also makes the code more readable. > > Replacing five this_cpu_read() by one this_cpu_ptr() makes > the generated code smaller.
Does not for me, that is, the resulting asm is actually larger You're quite right the loop went missing; no idea wth that compiler is smoking (gcc-8.2 for me). In order to eliminate that loop it needs to think that two consecutive loads of this_cpu_read(cyc2ns.seq.sequence) will return the same value. But this_cpu_read() is an asm() statement, it _should_ not assume such. We assume that this_cpu_read() implies READ_ONCE() in a number of locations, this really should not happen. The reason it was written using this_cpu_read() is so that it can use %gs: prefixed instructions and avoid ever loading that percpu offset and doing manual address computation. Let me prod at this with a sharp stick.

