On 06/11/14 6:35 PM, Christophe Gisquet wrote:
> Hi,
> 
> 2014-11-06 21:48 GMT+01:00 James Almer <jamr...@gmail.com>:
>> 13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips
>> 8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips
> 
> A couple of naïve questions (I haven't checked):
> Does it increase the alignment requirement?
> If yes, should it be notified somewhere (API bump, comment in the
> relevant header, ...)?

No, the function checks for alignment and jumps to a branch that uses movdqu if 
needed.
ff_int32_to_float_a_avx also uses ymm regs and this same macro.

Nonetheless, instructions using the VEX coding scheme don't need any kind of 
alignment.
We could modify or duplicate these macros so the AVX versions don't do 
unnecessary things 
like

movu  m0, [mem]
mulps m0, m1

when "mulps m0, m1, [mem]" would work just as well regardless of alignment.
The only instruction that still needs alignment with the VEX scheme is of 
course movdqa.

> 
>> x86inc.asm doesn't seem to handle cmpps or its aliases properly when using 
>> avx.
> 
> I remember fixing its declaration (missing one parameter) while
> working on aac (maybe one year ago). It's unrelated, maybe?

If you use "cmpps m0, m1, 5" it will work for non-VEX coding, but error out 
otherwise 
since x86inc.asm turns that into "vcmpps m0, m1, 5" instead of "vcmpps m0, m0, 
m1, 5"

With aliases like cmpnltps it doesn't even add the "v" prefix.

> 
> Otherwise looks obvious.
> 

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to