On 10/3/2017 4:47 PM, Martin Vignali wrote: > Hello, > > >> I used GCC 7.2. clear_blocks_mmx is slower than c for me as well, but >> not the rest. >> Your compiler seems to have done a much better job than mine. Is it >> Clang? Does it somehow have vectorization enabled perhaps? Because >> that's not supposed to happen. >> >> > Yes it's Clang 8.1 > > I put the clear_blocks_c function, in a file and run > clang -S -O1 test_asm_gen.c > > the asm result is > .section __TEXT,__text,regular,pure_instructions > .macosx_version_min 10, 12 > .globl _clear_blocks_c > .p2align 4, 0x90 > _clear_blocks_c: ## @clear_blocks_c > .cfi_startproc > ## BB#0: > pushq %rbp > Ltmp0: > .cfi_def_cfa_offset 16 > Ltmp1: > .cfi_offset %rbp, -16 > movq %rsp, %rbp > Ltmp2: > .cfi_def_cfa_register %rbp > movl $768, %esi ## imm = 0x300 > callq ___bzero > popq %rbp > retq > .cfi_endproc > > > .subsections_via_symbols > > Seems like an optimized function is call for clear_blocks_c
Yeah, the c version uses memset. Guess clang's implementation is good. Patch pushed. Thanks. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel