Hello,
> I used GCC 7.2. clear_blocks_mmx is slower than c for me as well, but > not the rest. > Your compiler seems to have done a much better job than mine. Is it > Clang? Does it somehow have vectorization enabled perhaps? Because > that's not supposed to happen. > > Yes it's Clang 8.1 I put the clear_blocks_c function, in a file and run clang -S -O1 test_asm_gen.c the asm result is .section __TEXT,__text,regular,pure_instructions .macosx_version_min 10, 12 .globl _clear_blocks_c .p2align 4, 0x90 _clear_blocks_c: ## @clear_blocks_c .cfi_startproc ## BB#0: pushq %rbp Ltmp0: .cfi_def_cfa_offset 16 Ltmp1: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp2: .cfi_def_cfa_register %rbp movl $768, %esi ## imm = 0x300 callq ___bzero popq %rbp retq .cfi_endproc .subsections_via_symbols Seems like an optimized function is call for clear_blocks_c > > > I also modify several decoder/encoder, in order to fix the > DECLARE_ALIGNED > > from 16 to 32 > > > > I run make fate SAMPLES=fate-suite/ > > i have several errors, but after a check, these errors > > doesn't seems to be related to this patch > > Make sure to clean your build folder if you recently pulled new commits > from the git repository. Reconfigure if necessary. > > Ok, i rerun it, and pass fate test 2017-10-02 4:05 GMT+02:00 Ronald S. Bultje <rsbul...@gmail.com>: > Hi, > > On Sun, Oct 1, 2017 at 7:46 PM, Martin Vignali <martin.vign...@gmail.com> > wrote: > > > I also modify several decoder/encoder, in order to fix the > DECLARE_ALIGNED > > from 16 to 32 > > > > How did you decide which ones to change? > > Ronald > after running fate test, looks like tests fail when LOCAL_ALIGNED_16 or DECLARE_ALIGNED(16 is use to declare block variable not in other case. using git grep clear_block, i check all the files who use this func and change LOCAL_ALIGNED_16 to LOCAL_ALIGNED_32 or DECLARE_ALIGNED(16.. to DECLARE_ALIGNED(32... Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel