On 11/07/2011 06:26 PM, Loren Merritt wrote:

> On Mon, 7 Nov 2011, Justin Ruggles wrote:
> 
>> +.loop:
>> +    movu            m1, [v1q+offsetq]
>> +    mulps           m1, m1, [v2q+offsetq]
>> +    addps           m0, m0, m1
>> +    add        offsetq, mmsize
>>      js           .loop
> 
> addps had latency 3 or 4, whereas the loop should be 1 or 2 cycles per
> iteration just counting uops. Thus it's latency bound and could be
> improved by multiple accumulators.


I just realized that the only use of this function we have currently is
in aacdec and requires it to work with a length with multiple of 4.  I
couldn't even find a sample that triggers the function (I had to insert
a dummy call to test it). So I'll drop the AVX part for now.

I have another use for this function in the AC-3 encoder (per-band
energy calculation), but it requires it to work with both unaligned
input and arbitrary lengths. So I'll put that on my TODO list and
revisit the AVX part at that time.

-Justin

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to