Hello again,
I've found some interesting things. GDB says there is no difference between
AVCodecContext instance in my application and AVCodecContext instance in
the avplay. But oprofile gives me following results for the avplay and my
application (again under same conditions):

for my application:
samples  %        linenr info             symbol name
*875253   33.5064  h264_deblock.asm:393    ff_deblock_v_luma_8_avx*
182945   7.0035   h264_cabac.c:1768
decode_cabac_residual_nondc_internal
164362   6.2921   h264_cabac.c:1881       ff_h264_decode_mb_cabac
162307   6.2134   cabac.h:167             get_cabac
110644   4.2357   h264_deblock.asm:769    ff_deblock_v_luma_intra_8_avx
79413    3.0401   cabac.h:167             get_cabac_noinline
...

for the avplay:
samples  %        linenr info             symbol name
131335   11.5650  h264_cabac.c:1768
decode_cabac_residual_nondc_internal
116488   10.2576  cabac.h:167             get_cabac
113620   10.0051  h264_cabac.c:1881       ff_h264_decode_mb_cabac
55205    4.8612   cabac.h:167             get_cabac_noinline
47232    4.1591   h264_i386.h:46          decode_significance_x86
41296    3.6364   h264_mvpred.h:443       fill_decode_caches
40602    3.5753   h264_qpel_8bit.asm:469
 ff_put_h264_qpel8or16_v_lowpass_sse2
38861    3.4220   h264.c:347              await_references
35460    3.1225   h264.c:3513             loop_filter
35173    3.0972   h264_mb_template.c:42   hl_decode_mb_simple_8
...
*17207    1.5152   h264_deblock.asm:393    ff_deblock_v_luma_8_avx*
...

It's clear the problem is in the time spent inside of the
ff_deblock_v_luma_8_avx function. My application spends approximately 34%
of CPU time inside of the ff_deblock_v_luma_8_avx function but in case of
the avplay it's only 1.5%. (The number of invocations of the
ff_deblock_v_luma_8_avx is exactly same for both applications.)

I've tried to recompile Libav using --disable-avx option. It uses SSE2 now
and there is almost no difference between both applications but the avplay
causes higher CPU load than before.

My question is - why is the AVX version of the H.264 loop filter so slow in
case of my application? (Note: I use av_malloc or other Libav functions for
all frame/packet/buffer allocations, so all buffers should be aligned.)

OP
_______________________________________________
libav-api mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-api

Reply via email to