Re: N800 & Video playback

Daniel Amelang Tue, 01 May 2007 21:51:31 -0700

On 4/30/07, Daniel Stone <[EMAIL PROTECTED]> wrote:


> There are two important optimizations in this code:
> 1. Cache prefetch with PLD instruction (added in '_armv5' version) which
> boosts performance to 70 megapixels per second. Inner loop is unrolled
> to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so
> such unrolling is convenient). This is the most important improvement.
> You can try using __builtin_prefetch() from C code to do the same
> optimization.

Ah, sounds useful.  From what Dan Amelang's been saying on xorg@, gcc
should coalesce four 32-bit reads into one 128-bit read, but this sounds
promising as well.


To expand on this: I was referring to fact that gcc is pretty smart
about using ldmia/stdmia instructions to cluster sequential
reads/writes. I see that Siarhei is already using this technique in
his assembler code, so nothing new here.

Dan
_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Re: N800 & Video playback

Reply via email to