> Subject: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-
> bypass assumptions
> 
> Before we rework the "pmem api" to stop abusing __copy_user_nocache()
> for memcpy_to_pmem() we need to fix cases where we may strand dirty data
> in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
> for arbitrary data transfers from userspace. There is no guarantee that
> these transfers, performed by dax_iomap_actor(), will have aligned
> destinations or aligned transfer lengths. Backstop the usage
> __copy_user_nocache() with explicit cache management in these unaligned
> cases.
> 
> Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
> that is saved for a later patch that moves the entirety of the "pmem
> api" into the pmem driver directly.
 :
> ---
> v2: Change the condition for flushing the last cacheline of the
>     destination from 8-byte to 4-byte misalignment (Toshi)
 :
>  arch/x86/include/asm/pmem.h |   41 ++++++++++++++++++++++++++++++----
 :
> @@ -94,7 +86,34 @@ static inline size_t arch_copy_from_iter_pmem(void
> *addr, size_t bytes,
>       /* TODO: skip the write-back by always using non-temporal stores */
>       len = copy_from_iter_nocache(addr, bytes, i);
> 
> -     if (__iter_needs_pmem_wb(i))
> +     /*
> +      * In the iovec case on x86_64 copy_from_iter_nocache() uses
> +      * non-temporal stores for the bulk of the transfer, but we need
> +      * to manually flush if the transfer is unaligned. In the
> +      * non-iovec case the entire destination needs to be flushed.
> +      */
> +     if (iter_is_iovec(i)) {
> +             unsigned long dest = (unsigned long) addr;
> +
> +             /*
> +              * If the destination is not 8-byte aligned then
> +              * __copy_user_nocache (on x86_64) uses cached copies
> +              */
> +             if (dest & 8) {
> +                     arch_wb_cache_pmem(addr, 1);
> +                     dest = ALIGN(dest, 8);
> +             }
> +
> +             /*
> +              * If the remaining transfer length, after accounting
> +              * for destination alignment, is not 4-byte aligned
> +              * then __copy_user_nocache() falls back to cached
> +              * copies for the trailing bytes in the final cacheline
> +              * of the transfer.
> +              */
> +             if ((bytes - (dest - (unsigned long) addr)) & 4)
> +                     arch_wb_cache_pmem(addr + bytes - 1, 1);
> +     } else
>               arch_wb_cache_pmem(addr, bytes);
> 
>       return len;

Thanks for the update.  I think the alignment check should be based on
the following note in copy_user_nocache.

 * Note: Cached memory copy is used when destination or size is not
 * naturally aligned. That is:
 *  - Require 8-byte alignment when size is 8 bytes or larger.
 *  - Require 4-byte alignment when size is 4 bytes.

So, I think the code may be something like this.  I also made the following 
changes:
 - Mask with 7, not 8.
 - ALIGN with cacheline size, instead of 8.
 - Add (bytes > flushed) test since calculation with unsigned long still 
results in a negative
   value (as a positive value).

        if (bytes < 8) {
                if ((dest & 3) || (bytes != 4))
                        arch_wb_cache_pmem(addr, 1);
        } else {
                if (dest & 7) {
                        dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
                        arch_wb_cache_pmem(addr, 1);
                }

                flushed = dest - (unsigned long) addr;
                if ((bytes > flushed) && ((bytes - flushed) & 7))
                        arch_wb_cache_pmem(addr + bytes - 1, 1);
        }

Thanks,
-Toshi

Reply via email to