Hi Vineet,

On Tue, 2019-01-15 at 01:09 +0000, Vineet Gupta wrote:
> On 1/14/19 7:17 AM, Eugeniy Paltsev wrote:
> > Current ARCv2 memeset implementation may call 'prefetchw'
> > instruction for address which lies outside of memset area.
> > So we got one modified (dirty) cache line outside of memset
> > area. This may lead to data corruption if this area is used
> > for DMA IO.
> > 
> > Another issue is that current ARCv2 memeset implementation
> > may call 'prealloc' instruction for L1 cache line which
> > doesn't fully belongs to memeset area in case of 128B L1 D$
> > line length. That leads to data corruption.
> > 
> > 
 * Fix prefetchw/prealloc instructions using in case of 64B L1 data
> > cache line (default case) and don't use prefetch* instructions
> > for other possible L1 data cache line lengths (32B and 128B).
> > 
> > Signed-off-by: Eugeniy Paltsev <eugeniy.palt...@synopsys.com>
> > ---
> >  arch/arc/lib/memset-archs.S | 30 +++++++++++++++++++++++-------
> >  1 file changed, 23 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/arc/lib/memset-archs.S b/arch/arc/lib/memset-archs.S
> > index 62ad4bcb841a..c7717832336f 100644
> > --- a/arch/arc/lib/memset-archs.S
> > +++ b/arch/arc/lib/memset-archs.S
> > @@ -7,11 +7,32 @@
> >   */
> >  
> >  #include <linux/linkage.h>
> > +#include <asm/cache.h>
> >  
> >  #undef PREALLOC_NOT_AVAIL
> >  
> > +/*
> > + * The memset implementation below is optimized to use prefetchw and 
> > prealloc
> > + * instruction in case of CPU with 64B L1 data cache line (L1_CACHE_SHIFT 
> > == 6)
> > + * If you want to implement optimized memset for other possible L1 data 
> > cache
> > + * line lengths (32B and 128B) you should rewrite code carefully checking
> > + * we don't call any prefetchw/prealloc instruction for L1 cache lines 
> > which
> > + * don't belongs to memset area.
> 
> Good point. FWIW, it is possible to support those non common line lengths by 
> using
> L1_CACHE_SHIFT etc in asm code below but I agree its not worth the trouble.
> 
> > + */
> > +#if L1_CACHE_SHIFT!=6
> > +# define PREALLOC_INSTR(...)
> > +# define PREFETCHW_INSTR(...)
> > +#else  /* L1_CACHE_SHIFT!=6 */
> > +# define PREFETCHW_INSTR(...)      prefetchw __VA_ARGS__
> > +# ifdef PREALLOC_NOT_AVAIL
> > +#  define PREALLOC_INSTR(...)      prefetchw __VA_ARGS__
> > +# else
> > +#  define PREALLOC_INSTR(...)      prealloc __VA_ARGS__
> > +# endif
> > +#endif /* L1_CACHE_SHIFT!=6 */
> > +
> >  ENTRY_CFI(memset)
> > -   prefetchw [r0]          ; Prefetch the write location
> > +   PREFETCHW_INSTR([r0])   ; Prefetch the first write location
> >     mov.f   0, r2
> >  ;;; if size is zero
> >     jz.d    [blink]
> > @@ -48,11 +69,7 @@ ENTRY_CFI(memset)
> >  
> >     lpnz    @.Lset64bytes
> >     ;; LOOP START
> > -#ifdef PREALLOC_NOT_AVAIL
> > -   prefetchw [r3, 64]      ;Prefetch the next write location
> > -#else
> > -   prealloc  [r3, 64]
> > -#endif
> > +   PREALLOC_INSTR([r3, 64]) ;Prefetch the next write location
> 
> These are not solving the issue - I'd break this up and move these bits to 
> your
> next patch.

Actually these are solving another issue - current implementation may call
'prealloc' instruction for L1 cache line which doesn't fully belongs to
memeset area in case of 128B L1 D$ line length. As the 'prealloc' fill cache 
line
with zeros this leads to data corruption.

So I would better keep these changes in this 'fix' patch.


BTW, I've forgot again to add Cc: sta...@vger.kernel.org, could you add it for 
me,
when applying patch?
Thanks.


> >  #ifdef CONFIG_ARC_HAS_LL64
> >     std.ab  r4, [r3, 8]
> >     std.ab  r4, [r3, 8]
> > @@ -85,7 +102,6 @@ ENTRY_CFI(memset)
> >     lsr.f   lp_count, r2, 5 ;Last remaining  max 124 bytes
> >     lpnz    .Lset32bytes
> >     ;; LOOP START
> > -   prefetchw   [r3, 32]    ;Prefetch the next write location
> 
> So the real fix for issue at hand is this line. I'll keep this for the fix and
> beef up the changelog. Thing is existing code was already skipping the last 
> 64B
> from the main loop (thus avoided prefetching the next line), but then 
> reintroduced
> prefetchw is last 32B loop, spoiling the party.  That prefetchw was pointless 
> anyways
> 
> -Vineet
-- 
 Eugeniy Paltsev

Reply via email to