On Thu, Nov 22, 2018 at 9:53 AM Linus Torvalds <torva...@linux-foundation.org> wrote: > > On Thu, Nov 22, 2018 at 9:36 AM David Laight <david.lai...@aculab.com> wrote: > > > > The other problem with the ERMS copy is that it gets used > > for copy_to/from_io() - and the 'rep movsb' on uncached > > locations has to do byte copies. > > Ugh. I thought we changed that *long* ago, because even our non-ERMS > copy is broken for PCI (it does overlapping stores for the small tail > cases). > > But looking at "memcpy_{from,to}io()", I don't see x86 overriding it > with anything better. > > I suspect nobody uses those functions for anything critical any more. > The fbcon people have their own copy functions, iirc. > > But we definitely should fix this. *NONE* of the regular memcpy > functions actually work right for PCI space any more, and haven't for > a long time.
I'm not personally volunteering, but I suspect we can do much better than we do now: - The new MOVDIRI and MOVDIR64B instructions can do big writes to WC and UC memory. I assume those would be safe to use in ...toio() functions, unless there are quirky devices out there that blow up if their MMIO space is written in 64-byte chunks. - MOVNTDQA can, I think, do 64-byte loads, but only from WC memory. For sufficiently large copies, it could plausibly be faster to create a WC alias and use MOVNTDQA than it is to copy in 8- for 16-byte chunks. The i915 driver has a copy implementation using MOVNTDQA -- maybe this should get promoted to something in arch/x86 called memcpy_from_wc(). --Andy