On NEON alignment,
There can be lot improvement in performance if buffers are aligned. Also 
interleaving stores would not throttle the store buffer.
Maximum outstanding stores can be 8 dregisters.

Instructions in A8 are statically scheduled and hence there are two varieties 
of loads and stores one without specifying alignment restriction and other with 
alignment restriction (@64, @128, @256 bits alignment). There are no intrinsic 
for specifying alignment.

So to get maximum memory bandwidth one has to align buffers and also use 
special instructions specifiers.

Example:
Example:
VLD1    {d0},[pSrc]        ;// takes 2 cycles
VLD1    {d0,d1},[p...@64]  ;// takes 1 cycles
VST1    {d0},[pDst]        ;// takes 2 cycles
VST1    {d0,d1},[p...@64]  ;// takes 1 cycles

VLD1    {d0,d1},[pSrc]     ;// takes 2 cycles
VLD1    {d0,d1},[p...@128] ;// takes 1 cycles
VST1    {d0,d1},[pDst]     ;// takes 2 cycles
VST1    {d0,d1},[p...@128] ;// takes 1 cycles

For more information on code examples one can check 
http://www.arm.com/products/multimedia/openmax/index.html

Regards,
/G


--- On Wed, 9/16/09, Rémi Denis-Courmont <r...@videolan.org> wrote:

> From: Rémi Denis-Courmont <r...@videolan.org>
> Subject: [mpeg2-dev] [RFC] [PATCH] ARM Advanced SIMD motion compensation
> To: libmpeg2-devel@lists.sourceforge.net
> Date: Wednesday, September 16, 2009, 1:42 AM
>     Hello all,
> 
> ARMv7 includes an optional "Advanced SIMD" instructions
> set, commercially 
> known as NEON. This is included in the recent Cortex line
> of ARM processors. 
> In particular, Cortex-A8 is found on TI-OMAP3xxx boards
> such as BeagleBoard, 
> or the Nokia N900.
> 
> Attached is an intial patch against libmpeg2 trunk to use
> NEON for motion 
> compensation. This is preliminary. There are a bunch of
> known CPU stalls. 
> Those could probably be fixed using plain assembly and
> interleaving subsequent 
> loads. Also, iDCT is not optimized. Anyway, here are my
> results with an 
> OMA3430 board:
> 
> With C, no acceleration:
> 7305 frames in 19.87 sec (367.64 fps), 155 last 0.50 sec
> (310.00 fps)
> 7308 frames decoded in 19.88 seconds (367.61 fps) 
>                
>   
> 7288 frames in 19.88 sec (366.60 fps), 170 last 0.50 sec
> (340.00 fps)
> 7308 frames decoded in 19.95 seconds (366.32 fps) 
>                
>   
> 
> With ARM acceleration (current libmpeg2):
> 7254 frames in 18.88 sec (384.22 fps), 180 last 0.50 sec
> (360.00 fps)
> 7308 frames decoded in 19.04 seconds (383.82 fps)
> 7263 frames in 18.88 sec (384.69 fps), 175 last 0.50 sec
> (350.00 fps)
> 7308 frames decoded in 19.02 seconds (384.23 fps)
> 
> With NEON acceleration (this patch):
> 7129 frames in 15.39 sec (463.22 fps), 245 last 0.50 sec
> (490.00 fps)
> 7308 frames decoded in 15.85 seconds (461.07 fps)
> 7127 frames in 15.38 sec (463.39 fps), 245 last 0.50 sec
> (490.00 fps)
> 7308 frames decoded in 15.85 seconds (461.07 fps)
> 
> So, there is already quite a big improvement!
> 
> I wonder if there is any warranty on the memory alignment
> of some of the 
> buffers? NEON can save one cycle per load/store we use
> aligned-specific 
> opcodes. Currently, the code assumes no alignment.
> 
> Comments welcome!
> 
> -- 
> Rémi Denis-Courmont
> http://git.remlab.net/cgi-bin/gitweb.cgi?p=vlc-courmisch.git;a=summary
> 
> -----Inline Attachment Follows-----
> 
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry&reg; Developer
> Conference in SF, CA
> is the only developer event you need to attend this year.
> Jumpstart your
> developing skills, take BlackBerry mobile applications to
> market and stay 
> ahead of the curve. Join us from November 9-12, 2009.
> Register now!
> http://p.sf.net/sfu/devconf
> -----Inline Attachment Follows-----
> 
> _______________________________________________
> Libmpeg2-devel mailing list
> Libmpeg2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/libmpeg2-devel
> 


      

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
Libmpeg2-devel mailing list
Libmpeg2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmpeg2-devel

Reply via email to