On Fri, May 29, 2026 at 04:29:34PM -0300, Jason Gunthorpe wrote: > On Fri, May 29, 2026 at 05:55:16PM +0100, David Laight wrote: > > On Fri, 29 May 2026 10:49:47 -0300 > > Jason Gunthorpe <[email protected]> wrote: > > > > > On Thu, May 28, 2026 at 06:13:26PM +0000, David Matlack wrote: > > > > > > > Let's put these in tools/arch/arm64/include/asm/io.h so that the tools > > > > headers are more aligned with the kernel headers, and so that the arm64 > > > > io.h overrides are done in the same way as the x86 overrides in > > > > tools/arch/x86/include/asm/io.h. > > > > > > > > Something like this (untested): > > > > > > Okay, the disassembly says it works: > > > > > > 1db8: ca080108 eor x8, x8, x8 > > > 1dbc: b5000008 cbnz x8, 1dbc <readl+0x58> > > > 1dc0: f9000fe8 str x8, [sp, #24] > > > > That looks strange, I suspect the C didn't match any usual pattern. > > Normally 'tmp' would get thrown away and 'v' would get kept. > > But you seem to have discarded 'v' and written 'tmp' to stack. > > Oh interesting the optimizer isn't turned on for selftest builds. So > the str is dutifully writing tmp to the stack. Another register has > the actual value. > > > I'm probably being stupid again, but how does that work? > > The cpu can speculate straight through the control dependency into > > the following instructions. > > An 'eor x1, x8, x8' may not even have a data-dependency on x8. > > (Most x86 cpus just generate a zero for the equivalent instruction.) > > I can't say, this is copied from the kernel and Will made it: > > arm64: io: Ensure calls to delay routines are ordered against prior > readX()
This is specifically for ordering counter accesses against prior barriered MMIO reads. Userspace should really be using the vDSO instead of accessing the counter directly, so you could probably drop this for the tools headers tbh and just have the dma_rmb(). Will
