On Wed, 29 Apr 2020, Heinz Mauelshagen wrote:

> On 4/29/20 6:30 PM, Mikulas Patocka wrote:
> > Hi
> > 
> > This is the clflushopt patch for the next merge window.
> > 
> > Mikulas
> > 
> > 
> > From: Mikulas Patocka <[email protected]>
> > 
> > When testing the dm-writecache target on a real Optane-based persistent
> > memory, it turned out that explicit cache flushing using the clflushopt
> > instruction performs better than non-temporal stores for block sizes 1k,
> > 2k and 4k.
> > 
> > This patch adds a new function memcpy_flushcache_optimized that tests if
> > clflushopt is present - and if it is, we use it instead of
> > memcpy_flushcache.
> > 
> > Signed-off-by: Mikulas Patocka <[email protected]>
> > 
> > ---
> >   drivers/md/dm-writecache.c |   29 ++++++++++++++++++++++++++++-
> >   1 file changed, 28 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6/drivers/md/dm-writecache.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-writecache.c       2020-04-29 
> > 18:09:53.599999000
> > +0200
> > +++ linux-2.6/drivers/md/dm-writecache.c    2020-04-29 18:22:36.139999000
> > +0200
> > @@ -1137,6 +1137,33 @@ static int writecache_message(struct dm_
> >     return r;
> >   }
> >   +static void memcpy_flushcache_optimized(void *dest, void *source, size_t
> > size)
> > +{
> > +   /*
> > +    * clufhsopt performs better with block size 1024, 2048, 4096
> > +    * non-temporal stores perform better with block size 512
> > +    *
> > +    * block size   512             1024            2048            4096
> > +    * movnti       496 MB/s        642 MB/s        725 MB/s        744
> > MB/s
> > +    * clflushopt   373 MB/s        688 MB/s        1.1 GB/s        1.2
> > GB/s
> > +    */
> > +#ifdef CONFIG_X86
> > +   if (static_cpu_has(X86_FEATURE_CLFLUSHOPT) &&
> > +       likely(boot_cpu_data.x86_clflush_size == 64) &&
> > +       likely(size >= 768)) {
> > +           do {
> > +                   memcpy((void *)dest, (void *)source, 64);
> > +                   clflushopt((void *)dest);
> > +                   dest += 64;
> > +                   source += 64;
> > +                   size -= 64;
> > +           } while (size >= 64);
> > +           return;
> 
> 
> Aren't memory barriers needed for ordering before and after the loop?
> 
> Heinz

This is called while holding the writecache lock - and wc_unlock serves as 
a memory barrier.

Mikulas

--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel

Reply via email to