On 06/19/2012 10:05 AM, Loren Merritt wrote: > On Mon, 18 Jun 2012, Justin Ruggles wrote: > >> Adds optimized functions for mixing 3 through 8 input channels to 1 and 2 >> output channels in fltp or s16p format with flt coeffs. > > avx and fma4 should be usable for all of the functions, even if for some > reason you can't use ymm.
ah, true. I'll try it. > If stack misalignment is a problem, then align the stack. I tried various ways to do this, and the only way that worked was requesting 32-byte starting stack alignment from gcc. But the docs do not say that it's guaranteed when requested, nor do I know how to handle this for other compilers. Is there some other good way to do this? > matrix_on_stack is enabled in more cases than necessary. And even if you > do need to spill some matrix coefs, spill only the excess, not all of > them. ok, I'll see if I can do that without too much complexity. > Do the int16 functions need to use float accumulators, or would > fixed-point math be sufficient? In this case the coefficients are float, so it makes sense to use float accumulators. I think trying to handle both coefficient types in this macro would be way too much. I'll probably do a separate macro for 6 to 2 only with fixed-point coefficients. > In all of the n_to_1 avx and fma4 funcs, there are some loads that could > be memory args. Ah, I think I see what you mean. When the matrix is in registers, switch the order of the arguments to avoid the separate load? > Unused DEFINE_MIX_3_6_TO_1_2. oops... leftover. > Use pointers to the end of each array, rather than offsets from the first > array. This allows one less sub inside the loop. Hmm. I was following the pattern of the float_interleave functions, but yeah that sounds like it would work. -Justin _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel