On NEON alignment, There can be lot improvement in performance if buffers are aligned. Also interleaving stores would not throttle the store buffer. Maximum outstanding stores can be 8 dregisters.
Instructions in A8 are statically scheduled and hence there are two varieties of loads and stores one without specifying alignment restriction and other with alignment restriction (@64, @128, @256 bits alignment). There are no intrinsic for specifying alignment. So to get maximum memory bandwidth one has to align buffers and also use special instructions specifiers. Example: Example: VLD1 {d0},[pSrc] ;// takes 2 cycles VLD1 {d0,d1},[p...@64] ;// takes 1 cycles VST1 {d0},[pDst] ;// takes 2 cycles VST1 {d0,d1},[p...@64] ;// takes 1 cycles VLD1 {d0,d1},[pSrc] ;// takes 2 cycles VLD1 {d0,d1},[p...@128] ;// takes 1 cycles VST1 {d0,d1},[pDst] ;// takes 2 cycles VST1 {d0,d1},[p...@128] ;// takes 1 cycles For more information on code examples one can check http://www.arm.com/products/multimedia/openmax/index.html Regards, /G --- On Wed, 9/16/09, Rémi Denis-Courmont <r...@videolan.org> wrote: > From: Rémi Denis-Courmont <r...@videolan.org> > Subject: [mpeg2-dev] [RFC] [PATCH] ARM Advanced SIMD motion compensation > To: libmpeg2-devel@lists.sourceforge.net > Date: Wednesday, September 16, 2009, 1:42 AM > Hello all, > > ARMv7 includes an optional "Advanced SIMD" instructions > set, commercially > known as NEON. This is included in the recent Cortex line > of ARM processors. > In particular, Cortex-A8 is found on TI-OMAP3xxx boards > such as BeagleBoard, > or the Nokia N900. > > Attached is an intial patch against libmpeg2 trunk to use > NEON for motion > compensation. This is preliminary. There are a bunch of > known CPU stalls. > Those could probably be fixed using plain assembly and > interleaving subsequent > loads. Also, iDCT is not optimized. Anyway, here are my > results with an > OMA3430 board: > > With C, no acceleration: > 7305 frames in 19.87 sec (367.64 fps), 155 last 0.50 sec > (310.00 fps) > 7308 frames decoded in 19.88 seconds (367.61 fps) > > > 7288 frames in 19.88 sec (366.60 fps), 170 last 0.50 sec > (340.00 fps) > 7308 frames decoded in 19.95 seconds (366.32 fps) > > > > With ARM acceleration (current libmpeg2): > 7254 frames in 18.88 sec (384.22 fps), 180 last 0.50 sec > (360.00 fps) > 7308 frames decoded in 19.04 seconds (383.82 fps) > 7263 frames in 18.88 sec (384.69 fps), 175 last 0.50 sec > (350.00 fps) > 7308 frames decoded in 19.02 seconds (384.23 fps) > > With NEON acceleration (this patch): > 7129 frames in 15.39 sec (463.22 fps), 245 last 0.50 sec > (490.00 fps) > 7308 frames decoded in 15.85 seconds (461.07 fps) > 7127 frames in 15.38 sec (463.39 fps), 245 last 0.50 sec > (490.00 fps) > 7308 frames decoded in 15.85 seconds (461.07 fps) > > So, there is already quite a big improvement! > > I wonder if there is any warranty on the memory alignment > of some of the > buffers? NEON can save one cycle per load/store we use > aligned-specific > opcodes. Currently, the code assumes no alignment. > > Comments welcome! > > -- > Rémi Denis-Courmont > http://git.remlab.net/cgi-bin/gitweb.cgi?p=vlc-courmisch.git;a=summary > > -----Inline Attachment Follows----- > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer > Conference in SF, CA > is the only developer event you need to attend this year. > Jumpstart your > developing skills, take BlackBerry mobile applications to > market and stay > ahead of the curve. Join us from November 9-12, 2009. > Register now! > http://p.sf.net/sfu/devconf > -----Inline Attachment Follows----- > > _______________________________________________ > Libmpeg2-devel mailing list > Libmpeg2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/libmpeg2-devel > ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ Libmpeg2-devel mailing list Libmpeg2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmpeg2-devel