Hello, > > > Checkasm result (Kaby Lake, os 10.12) > > restore_rgb_planes_c: 8371.0 > > restore_rgb_planes_sse2: 6583.7 > > restore_rgb_planes_avx2: 3596.5 > > > > restore_rgb_planes10_c: 16735.7 > > restore_rgb_planes10_sse2: 11478.5 > > restore_rgb_planes10_avx2: 7193.7 > > Curious, on my Haswell (mingw-w64 Win10) i get > > restore_rgb_planes_c: 79500.7 > restore_rgb_planes_sse2: 6872.7 > restore_rgb_planes_avx2: 6715.7 > > restore_rgb_planes10_c: 91394.7 > restore_rgb_planes10_sse2: 14494.0 > restore_rgb_planes10_avx2: 13468.7 > > I check again, i have the same kind of result, than before Strange, that the speed improvment is so small in Haswell
> > > > > > Pass fate test for me > > > > > > 0001-checkasm-add-utvideodsp-test : > > I'm not entirely sure of mine, for this checkasm, > > > > 0002-libavcodec-x86-utvideodsp-make-macro-for-func > > Code reorganization > > > > 0003-libavcodec-utvideodsp-add-avx2-version-for-the-dsp > > AVX2 version > > > > 0004-libavcodec-x86-utvideodsp.asm-cosmetic > > Cosmetic > > > > Martin > > Jokyo Images > > Sorry i missed this set. The asm changes look simple and good. Only > thing I'd have done was making sure the constants were wide enough to > avoid having to use vpbroadcast instructions. > I noticed for that matter that said constants already exist in > constants.c, so i just made it use them instead. > Thanks for all the fix. Your comments, for the use of vpbroadcast for constantes load, seems similar to a previous comment by James Darnley (in discussion libavcodec/bswapdsp : add AVX2 for bswap_buf) I use here the same way use by Henrik Gramner in exr_dsp.predictor func (but i'm ok to modify that part if need) Do you think we need to replace all %if cpuflag(avx2) vbroadcasti128 mm, [constantes] %else mova mm, [constantes] %endif by your method ? (for exr_dsp, the answer is probably yes, because it's also use pb_80 (i will send a patch for that)) If yes, is it better to use in asm (for example for bswapdsp) SECTION_RODATA 32 pb_bswap32: times 2 db 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 or adding a constantes (if not exists), in constant.c/h ? Seems like this case will be common for AVX2 version of dsp func. > > The checkasm test is a bit ugly and could use some cosmetics, though. > > Except one thing, (WIDTH_PADDED calc is strange (doesn't remember why i write this, and only works by "luck"), need to be WIDTH + 16 Do you think, it's need more modification (considering your recent patchs) ? Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel