Am Dienstag, den 23.04.2019, 19:21 +0200 schrieb Ahmad Fatoum:
> On 4/23/19 7:18 PM, Ahmad Fatoum wrote:
> > The ARM Cortex-A Series Programmer's Guide notes[1]:
> > > Some care is required with cache maintenance activity in multi-core
> 
> Unfortunately, the only error scenario I could find in the ARM docs
> was this one here, which presumes a multi-core system.
> 
> Lucas, do you have a better example in context of barebox on how things
> could go awry without this patch?

In the Barebox case we don't care about SMP, but about the HW
prefetcher, which is an independent master and isn't bound by the
instructions executed on the core.

So the sequence that could go wrong in Barebox is as follows:
1. CPU core starts invalidating at L1 cache level
2. HW prefetcher decides that a specific address should be brought into
the L1 cache.
3. HW prefetcher finds a valid block for the requested address in L2
cache and moves cached data from L2 to L1.
4. Only now CPU core invalidates L2 cache.

In the above sequence we now have invalid data in the L1 cache line.
The correct sequence will avoid this issue:

1. CPU core starts invalidating at L2 cache level
2. HW prefetcher decides that a specific address should be brought into
the L1 cache.
3. HW prefetcher sees invalid tags for the requested address in L2
cache and brings in the data from external memory.
4. CPU core invalidates L1 cache, discarding the prefetched data.

With the correct sequence we never end up with invalid data in valid
cache lines, we just waste some external memory bandwidth on the
discarded prefetch.

Regards,
Lucas

> 
> > > systems that include a L2C-310 L2 cache (or similar). Cleaning or
> > > invalidating the L1 cache and L2 cache will not be a single atomic
> > > operation. A core might therefore perform cache maintenance on
> > > a particular address in both L1 and L2 caches only as two discrete
> > > steps. If another core were to access the affected address between
> > > those two actions, a coherency problem can occur. Such problems can
> > > be avoided by following two simple rules.
> > > 
> > > * When cleaning, always clean the innermost (L1) cache first and then
> > >   clean the outer cache(s).
> > > * When invalidating, always invalidate the outermost cache first and
> > >   the L1 cache last.
> > 
> > The current code correctly iterates from inner to outer cache levels
> > when flushing/cleaning (r8 == 0), invalidation (r8 == 1) occurs in the
> > same direction though. Adjust the invalidation iteration order to start
> > from the outermost cache instead.
> > 
> > Equivalent C-Code:
> > 
> >     enum cache_op { CACHE_FLUSH = 0, CACHE_INVAL = 1 };
> >     register enum cache_op operation asm("r8");
> >     register int i asm("r12");
> >     register int limit asm("r3") = max_cache_level << 1; // e.g. 4 with L2 
> > max
> > 
> >     +if (operation == CACHE_FLUSH) {
> > > >                 i = 0;
> >     +} else {
> > > >         +       i = limit - 2;
> >     +}
> > 
> >      bool loop_again;
> >      do {
> > > >                 /* [snip] */
> > > >         +       if (operation == CACHE_FLUSH) {
> > > >                         i += 2;
> > > >                         loop_again = limit > i;
> > > >         +       } else {
> > > >         +               loop_again = i > 0;
> > > >         +               i -= 2;
> > > >         +       }
> >     } while (loop_again);
> > 
> > [1]: 18.6 "TLB and cache maintenance broadcast", Version 4.0
> > 
> > > > Suggested-by: Lucas Stach <[email protected]>
> > > > Signed-off-by: Ahmad Fatoum <[email protected]>
> > ---
> >  arch/arm/cpu/cache-armv7.S | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/cpu/cache-armv7.S b/arch/arm/cpu/cache-armv7.S
> > index 0eb0ecfee756..b2d70bece7b9 100644
> > --- a/arch/arm/cpu/cache-armv7.S
> > +++ b/arch/arm/cpu/cache-armv7.S
> > @@ -83,7 +83,10 @@ hierarchical:
> > > > > > > >                 ands    r3, r0, #0x7000000      @ extract loc 
> > > > > > > > from clidr
> > > > > > > >                 mov     r3, r3, lsr #23         @ left align 
> > > > > > > > loc bit field
> > > > > > > >                 beq     finished                @ if loc is 0, 
> > > > > > > > then no need to clean
> > > > > > > > -               mov     r12, #0                 @ start clean 
> > > > > > > > at cache level 0
> > > > > > +           cmp     r8, #0
> > > > > > > > +THUMB(         ite     eq                      )
> > > > > > +           moveq   r12, #0
> > > > > > > > +               subne   r12, r3, #2             @ start 
> > > > > > > > invalidate at outmost cache level
> >  loop1:
> > > > > > > >                 add     r2, r12, r12, lsr #1    @ work out 3x 
> > > > > > > > current cache level
> > > > > > > >                 mov     r1, r0, lsr r2          @ extract cache 
> > > > > > > > type bits from clidr
> > > > > > > > @@ -118,8 +121,15 @@ THUMB(             ite     eq              
> > > > > > > >         )
> > > > > > > >                 subs    r7, r7, #1              @ decrement the 
> > > > > > > > index
> > > > > >             bge     loop2
> >  skip:
> > > > > > +           cmp     r8, #0
> > > > > > +           bne     inval_check
> > > > > > > >                 add     r12, r12, #2            @ increment 
> > > > > > > > cache number
> > > > > >             cmp     r3, r12
> > > > > > +           b       loop_end_check
> > +inval_check:
> > > > > > +           cmp     r12, #0
> > > > > > > > +               sub     r12, r12, #2            @ decrement 
> > > > > > > > cache number
> > +loop_end_check:
> >  #ifdef CONFIG_ARM_ERRATA_814220
> > > >                 dsb
> >  #endif
> > 
> 
> 

_______________________________________________
barebox mailing list
[email protected]
http://lists.infradead.org/mailman/listinfo/barebox

Reply via email to