On Mon, 18 Jun 2012, Justin Ruggles wrote: > Adds optimized functions for mixing 3 through 8 input channels to 1 and 2 > output channels in fltp or s16p format with flt coeffs.
avx and fma4 should be usable for all of the functions, even if for some reason you can't use ymm. If stack misalignment is a problem, then align the stack. matrix_on_stack is enabled in more cases than necessary. And even if you do need to spill some matrix coefs, spill only the excess, not all of them. Do the int16 functions need to use float accumulators, or would fixed-point math be sufficient? In all of the n_to_1 avx and fma4 funcs, there are some loads that could be memory args. Unused DEFINE_MIX_3_6_TO_1_2. Use pointers to the end of each array, rather than offsets from the first array. This allows one less sub inside the loop. --Loren Merritt _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel