On Tue, Nov 8, 2011 at 3:09 PM, Justin Ruggles <[email protected]> wrote:
> On 11/07/2011 06:26 PM, Loren Merritt wrote:
>
>> On Mon, 7 Nov 2011, Justin Ruggles wrote:
>>
>>> +.loop:
>>> +    movu            m1, [v1q+offsetq]
>>> +    mulps           m1, m1, [v2q+offsetq]
>>> +    addps           m0, m0, m1
>>> +    add        offsetq, mmsize
>>>      js           .loop
>>
>> addps had latency 3 or 4, whereas the loop should be 1 or 2 cycles per
>> iteration just counting uops. Thus it's latency bound and could be
>> improved by multiple accumulators.
>
>
> I just realized that the only use of this function we have currently is
> in aacdec and requires it to work with a length with multiple of 4.  I
> couldn't even find a sample that triggers the function (I had to insert
> a dummy call to test it). So I'll drop the AVX part for now.
>

http://streams.videolan.org/Mpeg_Conformance/ftp.iis.fhg.de/mpeg4audio-conformance/compressedMp4/al1[89]_*.mp4

--Alex
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to