On Tue, Jul 24, 2012 at 9:02 AM, John Stebbins <[email protected]> wrote:
> On 07/24/2012 05:53 PM, Jason Garrett-Glaser wrote:
>>
>> On Tue, Jul 24, 2012 at 8:34 AM, Måns Rullgård <[email protected]> wrote:
>>>
>>> Jason Garrett-Glaser <[email protected]> writes:
>>>
>>>> On Tue, Jul 24, 2012 at 8:05 AM, John Stebbins <[email protected]>
>>>> wrote:
>>>>>
>>>>> On 06/25/2012 02:42 PM, Mans Rullgard wrote:
>>>>>>
>>>>>> Module: libav
>>>>>> Branch: master
>>>>>> Commit: 82992604706144910f4a2f875d48cfc66c1b70d7
>>>>>>
>>>>>> Author:    Mans Rullgard <[email protected]>
>>>>>> Committer: Mans Rullgard <[email protected]>
>>>>>> Date:      Sat Jun 23 19:08:11 2012 +0100
>>>>>>
>>>>>> x86: fft: convert sse inline asm to yasm
>>>>>>
>>>>>> ---
>>>>>>
>>>>>>    libavcodec/x86/Makefile    |    1 -
>>>>>>    libavcodec/x86/fft_mmx.asm |  139
>>>>>> ++++++++++++++++++++++++++++++++++++++++---
>>>>>>    libavcodec/x86/fft_sse.c   |  110
>>>>>> ----------------------------------
>>>>>>    3 files changed, 129 insertions(+), 121 deletions(-)
>>>>>>
>>>>> Hi,
>>>>>
>>>>> This commit is causing some strange interaction with libx264 in
>>>>> HandBrake
>>>>> under certain conditions.  x264 is encoding at about 1/10th it's normal
>>>>> rate
>>>>> after updating to this commit.
>>>>>
>>>>> A little more background.  When doing ac3 passthru HandBrake encodes a
>>>>> single packet of silence data to ac3 that is uses for filling any gaps
>>>>> that
>>>>> it detects in the audio.  Encoding of this packet happens before any
>>>>> other
>>>>> encoding or decoding starts. For some crazy reason, if we encode this
>>>>> silence, we get the x264 slowdown.  If we do not encode the silence,
>>>>> the
>>>>> speed is ok.  I ran gprof on the code to see where all the time is
>>>>> being
>>>>> spent and it is all in x264.  So it's not like there is some run-away
>>>>> loop
>>>>> somewhere that is bringing everything to it's knees.  I'm guessing some
>>>>> cpu
>>>>> state must not be getting cleared or restored properly somewhere.
>>>>>
>>>>> John
>>>>
>>>> Could it have anything to do with denormals/NaN?
>>>
>>> Does x264 use floating-point SSE instructions anywhere?
>>
>> Yes, in macroblock-tree (because floating-point reciprocal is fast and
>> IDIV is slow), and in ratecontrol.
>>
>>
>
> I don't know if it is of any help, but here's the top entries from gprof
> when this slowdown is happening.
> x264 defaults + b-adapt=2
>
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  19.56     26.71    26.71 x264_pixel_satd_16x4_internal_avx
>  17.85     51.08    24.37 x264_pixel_satd_8x8_internal_avx
>  10.22     65.03    13.95 x264_sub8x8_dct_avx.skip_prologue
>   9.11     77.47    12.44 x264_hadamard_ac_8x8_avx
>   9.08     89.87    12.40 x264_intra_sa8d_x9_8x8_avx
>   5.08     96.81     6.94 x264_sub8x8_dct8_avx.skip_prologue
>   2.96    100.85     4.04 x264_pixel_satd_4x4_avx
>   2.45    104.20     3.35 x264_intra_satd_x9_4x4_avx
>   1.80    106.66     2.46 x264_mc_chroma_avx
>   1.58    108.82     2.16 x264_hpel_filter_avx
>   1.46    110.81     1.99 x264_pixel_ssim_4x4x2_core_avx
>   1.21    112.46     1.65 x264_add8x8_idct_avx.skip_prologue
>   1.09    113.95     1.49 x264_pixel_ssd_16x16_avx
>   1.09    115.44     1.49 x264_me_search_ref
>   1.02    116.83     1.39 x264_add8x8_idct8_avx.skip_prologue
>
> According to top, all CPUs are fully saturated

That's an incredibly distorted profile -- it looks like all the AVX
functions are running incredibly slowly.

Note that all those functions do not use 256-bit AVX, only 128-bit
AVX; Intel hasn't documented any sort of slowdown when mixing 128-bit
SSE and 128-bit AVX, which we do without problems.

Could the problem be that ffmpeg is doing 256-bit AVX, but then not
using vzeroupper afterwards?  Which CPU is this anyways?

Jason
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to