Forget the copy_page below. I used a non cache aligned buffer :-(

However if I enable the use of "dcbz" and remove "dcbt" in the
orginal copy_page() and use a cache aligned test buffer,
I still get a speedup of 30% or more on my mpc860 board.

I think a new CONFIG option is apropiate where one can turn
on the use of "dcbz" for 8xx. OK?

 Jocke

> Hi all
>
> I have been playing with the copy_page() function in arch/ppc/kernel/misc.S
> and gained about 30% speed up for my mpc860, rev D4 MHz.
>
> This is what i did:
> - Use dcbz on 8xx but clear ahead one cache line(performance is really crappy
>   if I don't clear ahead). This is the biggest improvement.
> - Use prefetch for 8xx as well.
>
> I know that dcbz is buggy for some 8xx CPUs but I don't know which ones.
> For me works just fine, except in copy_tofrom_user(don't know why).
>
> I would like to get some feedback & test results both for 8xx and non 8xx.
> Please include exact CPU and revision.
>
>  Thanks
>          Jocke
>
> _GLOBAL(copy_page)
>       addi    r3,r3,-4
>       addi    r4,r4,-4
>       li      r5,4
> #if MAX_COPY_PREFETCH > 1
>       /* This will prefetch past end of page, does not seem to be a problem? 
> */
>       li      r0,MAX_COPY_PREFETCH
>       li      r11,4
>       mtctr   r0
> 11:   dcbt    r11,r4
>       addi    r11,r11,L1_CACHE_LINE_SIZE
>       bdnz    11b
> #else /* MAX_L1_COPY_PREFETCH == 1 */
>       dcbt    r5,r4
>       li      r11,L1_CACHE_LINE_SIZE+4
> #endif /* MAX_L1_COPY_PREFETCH */
>       dcbz    r5,r3 /* older 8xx CPUs may have buggy dcbz instructions, if so 
> try "dcbt r5,r3" instead */
>       addi    r5,r5,L1_CACHE_LINE_SIZE
>       li      r0,4096/L1_CACHE_LINE_SIZE-1 /* All, but the last cache line of 
> data due dcbz below */
>       mtctr   r0
> 1:
>       dcbt    r11,r4
>       dcbz    r5,r3 /* zero the cache line after the one that is beeing copied
>                      * older 8xx CPUs may have buggy dcbz instructions, if so 
> try "dcbt r5,r3" instead */
>       COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 32
>       COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 64
>       COPY_16_BYTES
>       COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 128
>       COPY_16_BYTES
>       COPY_16_BYTES
>       COPY_16_BYTES
>       COPY_16_BYTES
> #endif
> #endif
> #endif
>       bdnz    1b
> /* Copy the last cache line of data */
>       COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 32
>       COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 64
>       COPY_16_BYTES
>       COPY_16_BYTES
> #if L1_CACHE_LINE_SIZE >= 128
>       COPY_16_BYTES
>       COPY_16_BYTES
>       COPY_16_BYTES
>       COPY_16_BYTES
> #endif
> #endif
> #endif
>       blr
>
>
>
>


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/



Reply via email to