RE: [PATCH] x86: only use ERMS for user copies for larger sizes

David Laight Mon, 26 Nov 2018 02:12:38 -0800

From: Andy Lutomirski
> Sent: 23 November 2018 19:11
> > On Nov 23, 2018, at 11:44 AM, Linus Torvalds 
> > <torva...@linux-foundation.org> wrote:
> >
> >> On Fri, Nov 23, 2018 at 10:39 AM Andy Lutomirski <l...@amacapital.net> 
> >> wrote:
> >>
> >> What is memcpy_to_io even supposed to do?  I’m guessing it’s defined as
> >> something like “copy this data to IO space using at most long-sized writes,
> >> all aligned, and writing each byte exactly once, in order.”
> >> That sounds... dubiously useful.
> >
> > We've got hundreds of users of it, so it's fairly common..
> 
> I’m wondering if the “at most long-sizes” restriction matters, especially
> given that we’re apparently accessing some of the same bytes more than once.
> I would believe that trying to encourage 16-byte writes (with AVX, ugh) or
> 64-byte writes (with MOVDIR64B) would be safe and could meaningfully speed
> up some workloads.


The real gains come from increasing the width of IO reads, not IO writes.
None of the x86 cpus I've got issue multiple concurrent PCIe reads
(the PCIe completion tag seems to match the core number).
PCIe writes are all 'posted' so there aren't big gaps between them.

> >> I could see a function that writes to aligned memory in specified-sized 
> >> chunks.
> >
> > We have that. It's called "__iowrite{32,64}_copy()". It has very few users.

For x86 you want separate entry points for the 'rep movq' copy
and one using an instruction loop.
(Perhaps with guidance to the cutover length.)
In most places the driver will know whether the size is above or below
the cutover - which might be 256.
Certainly transfers below 64 bytes are 'short'.

        David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

RE: [PATCH] x86: only use ERMS for user copies for larger sizes

Reply via email to