On Tue, Jul 24, 2012 at 9:02 AM, John Stebbins <[email protected]> wrote: > On 07/24/2012 05:53 PM, Jason Garrett-Glaser wrote: >> >> On Tue, Jul 24, 2012 at 8:34 AM, Måns Rullgård <[email protected]> wrote: >>> >>> Jason Garrett-Glaser <[email protected]> writes: >>> >>>> On Tue, Jul 24, 2012 at 8:05 AM, John Stebbins <[email protected]> >>>> wrote: >>>>> >>>>> On 06/25/2012 02:42 PM, Mans Rullgard wrote: >>>>>> >>>>>> Module: libav >>>>>> Branch: master >>>>>> Commit: 82992604706144910f4a2f875d48cfc66c1b70d7 >>>>>> >>>>>> Author: Mans Rullgard <[email protected]> >>>>>> Committer: Mans Rullgard <[email protected]> >>>>>> Date: Sat Jun 23 19:08:11 2012 +0100 >>>>>> >>>>>> x86: fft: convert sse inline asm to yasm >>>>>> >>>>>> --- >>>>>> >>>>>> libavcodec/x86/Makefile | 1 - >>>>>> libavcodec/x86/fft_mmx.asm | 139 >>>>>> ++++++++++++++++++++++++++++++++++++++++--- >>>>>> libavcodec/x86/fft_sse.c | 110 >>>>>> ---------------------------------- >>>>>> 3 files changed, 129 insertions(+), 121 deletions(-) >>>>>> >>>>> Hi, >>>>> >>>>> This commit is causing some strange interaction with libx264 in >>>>> HandBrake >>>>> under certain conditions. x264 is encoding at about 1/10th it's normal >>>>> rate >>>>> after updating to this commit. >>>>> >>>>> A little more background. When doing ac3 passthru HandBrake encodes a >>>>> single packet of silence data to ac3 that is uses for filling any gaps >>>>> that >>>>> it detects in the audio. Encoding of this packet happens before any >>>>> other >>>>> encoding or decoding starts. For some crazy reason, if we encode this >>>>> silence, we get the x264 slowdown. If we do not encode the silence, >>>>> the >>>>> speed is ok. I ran gprof on the code to see where all the time is >>>>> being >>>>> spent and it is all in x264. So it's not like there is some run-away >>>>> loop >>>>> somewhere that is bringing everything to it's knees. I'm guessing some >>>>> cpu >>>>> state must not be getting cleared or restored properly somewhere. >>>>> >>>>> John >>>> >>>> Could it have anything to do with denormals/NaN? >>> >>> Does x264 use floating-point SSE instructions anywhere? >> >> Yes, in macroblock-tree (because floating-point reciprocal is fast and >> IDIV is slow), and in ratecontrol. >> >> > > I don't know if it is of any help, but here's the top entries from gprof > when this slowdown is happening. > x264 defaults + b-adapt=2 > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 19.56 26.71 26.71 x264_pixel_satd_16x4_internal_avx > 17.85 51.08 24.37 x264_pixel_satd_8x8_internal_avx > 10.22 65.03 13.95 x264_sub8x8_dct_avx.skip_prologue > 9.11 77.47 12.44 x264_hadamard_ac_8x8_avx > 9.08 89.87 12.40 x264_intra_sa8d_x9_8x8_avx > 5.08 96.81 6.94 x264_sub8x8_dct8_avx.skip_prologue > 2.96 100.85 4.04 x264_pixel_satd_4x4_avx > 2.45 104.20 3.35 x264_intra_satd_x9_4x4_avx > 1.80 106.66 2.46 x264_mc_chroma_avx > 1.58 108.82 2.16 x264_hpel_filter_avx > 1.46 110.81 1.99 x264_pixel_ssim_4x4x2_core_avx > 1.21 112.46 1.65 x264_add8x8_idct_avx.skip_prologue > 1.09 113.95 1.49 x264_pixel_ssd_16x16_avx > 1.09 115.44 1.49 x264_me_search_ref > 1.02 116.83 1.39 x264_add8x8_idct8_avx.skip_prologue > > According to top, all CPUs are fully saturated
That's an incredibly distorted profile -- it looks like all the AVX functions are running incredibly slowly. Note that all those functions do not use 256-bit AVX, only 128-bit AVX; Intel hasn't documented any sort of slowdown when mixing 128-bit SSE and 128-bit AVX, which we do without problems. Could the problem be that ffmpeg is doing 256-bit AVX, but then not using vzeroupper afterwards? Which CPU is this anyways? Jason _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
