On 06/19/2012 10:05 AM, Loren Merritt wrote:
> On Mon, 18 Jun 2012, Justin Ruggles wrote:
> 
>> Adds optimized functions for mixing 3 through 8 input channels to 1 and 2
>> output channels in fltp or s16p format with flt coeffs.
> 
> avx and fma4 should be usable for all of the functions, even if for some
> reason you can't use ymm.

ah, true. I'll try it.

> If stack misalignment is a problem, then align the stack.

I tried various ways to do this, and the only way that worked was
requesting 32-byte starting stack alignment from gcc. But the docs do
not say that it's guaranteed when requested, nor do I know how to handle
this for other compilers. Is there some other good way to do this?

> matrix_on_stack is enabled in more cases than necessary. And even if you
> do need to spill some matrix coefs, spill only the excess, not all of
> them.

ok, I'll see if I can do that without too much complexity.

> Do the int16 functions need to use float accumulators, or would
> fixed-point math be sufficient?

In this case the coefficients are float, so it makes sense to use float
accumulators. I think trying to handle both coefficient types in this
macro would be way too much. I'll probably do a separate macro for 6 to
2 only with fixed-point coefficients.

> In all of the n_to_1 avx and fma4 funcs, there are some loads that could
> be memory args.

Ah, I think I see what you mean. When the matrix is in registers, switch
the order of the arguments to avoid the separate load?

> Unused DEFINE_MIX_3_6_TO_1_2.

oops... leftover.

> Use pointers to the end of each array, rather than offsets from the first
> array. This allows one less sub inside the loop.

Hmm. I was following the pattern of the float_interleave functions, but
yeah that sounds like it would work.

-Justin
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to