On Tue, Nov 8, 2011 at 3:09 PM, Justin Ruggles <[email protected]> wrote: > On 11/07/2011 06:26 PM, Loren Merritt wrote: > >> On Mon, 7 Nov 2011, Justin Ruggles wrote: >> >>> +.loop: >>> + movu m1, [v1q+offsetq] >>> + mulps m1, m1, [v2q+offsetq] >>> + addps m0, m0, m1 >>> + add offsetq, mmsize >>> js .loop >> >> addps had latency 3 or 4, whereas the loop should be 1 or 2 cycles per >> iteration just counting uops. Thus it's latency bound and could be >> improved by multiple accumulators. > > > I just realized that the only use of this function we have currently is > in aacdec and requires it to work with a length with multiple of 4. I > couldn't even find a sample that triggers the function (I had to insert > a dummy call to test it). So I'll drop the AVX part for now. >
http://streams.videolan.org/Mpeg_Conformance/ftp.iis.fhg.de/mpeg4audio-conformance/compressedMp4/al1[89]_*.mp4 --Alex _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
