Re: [FFmpeg-devel] [PATCH] avutil/lls: speed up performance of solve_lls

Ganesh Ajjanagadde Tue, 24 Nov 2015 19:16:51 -0800

On Tue, Nov 24, 2015 at 8:32 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote:
> On Tue, Nov 24, 2015 at 8:19 PM, Michael Niedermayer <michae...@gmx.at> wrote:
>> On Mon, Nov 23, 2015 at 08:00:12PM -0500, Ganesh Ajjanagadde wrote:
>>> This is a trivial rewrite of the loops that results in better cache
>>> efficiency. Essentially, iterating backwards over an array is a bad
>>> idea, since it leads to many cache misses: forward iteration fetches a
>>> new cache line that gets subsequently used, while backwards iteration 
>>> fetches the
>>> cache line thinking that the indexing would move forwards (the common 
>>> behavior),
>>> resulting in many cache misses.
>>>
>>> Speedup is nontrivial. Benchmarks obtained by 10^6 iterations within
>>> solve_lls, with START/STOP_TIMER. File is 
>>> tests/data/fate/flac-16-lpc-cholesky.err.
>>> Hardware: x86-64, Haswell, GNU/Linux.
>>>
>>> new:
>>>   12598 decicycles in solve_lls, 2096871 runs,    281 skips
>>>   12393 decicycles in solve_lls, 4193889 runs,    415 skips
>>>   12250 decicycles in solve_lls, 8387253 runs,   1355 skips
>>>   12585 decicycles in solve_lls,16775089 runs,   2127 skips
>>>   12785 decicycles in solve_lls,33550272 runs,   4160 skips
>>>   12483 decicycles in solve_lls,67101734 runs,   7130 skips
>>>   12610 decicycles in solve_lls,134202614 runs,  15114 skips
>>>
>>> old:
>>>   18101 decicycles in solve_lls, 2096659 runs,    493 skips
>>>   17850 decicycles in solve_lls, 4193609 runs,    695 skips
>>>   17757 decicycles in solve_lls, 8387458 runs,   1150 skips
>>>   17746 decicycles in solve_lls,16775207 runs,   2009 skips
>>>   17906 decicycles in solve_lls,33550820 runs,   3612 skips
>>>   17931 decicycles in solve_lls,67102891 runs,   5973 skips
>>>   17907 decicycles in solve_lls,134206693 runs,  11035 skips
>>>
>>> -------------------------------------------------------------------------------
>>> Barring asm for this function, there are two (more interesting) potential
>>> optimizations for the Cholesky decomposition here:
>>> 1. Notice that update_lls is doing a rank one update of the matrix via the 
>>> var
>>> vector. Rank one update of the Cholesky decomposition can be done much more
>>> efficiently than a full blown decomposition, see e.g Wikipedia. This will 
>>> almost
>>> certainly require some API thought.
>>>
>>> 2. LDL' form of the Cholesky factorization offers 2 main advantages:
>>> a) Avoiding sqrt and its associated cost, trading it off with extra 
>>> multiplications.
>>> This may or may not be worth the computational cost, though it seems like in
>>> FFmpeg, the Cholesky operates on small matrices of dim ~ 3-50, resulting in
>>> not too many extra multiplies.
>>> b) It provides benefits especially in the poorly conditioned
>>> case since one does not have to worry about nan's and associated tainting 
>>> due
>>> to the sqrt.
>>> This may or may not require API thought.
>>>
>>> I personally view 1 as being more unambiguously worthy of exploration at 
>>> this stage.
>>>
>>> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com>
>>> ---
>>>  libavutil/lls.c | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> this significantly changes the output for
>> libavutil/lls-test
>>
>> is that intended ?
>
> Wait, I thought there was only one line changed in a diff between the
> old and new output, and it was trivial involving a nan. Maybe I posted
> the wrong patch, in which case sorry.


I should have been suspicious of the huge perf change, there was an
extra stray line from the old patch I was working with. Reposted as
v2, updated perf numbers; it is far more modest as expected.
Explanation also improved.

>
>>
>> [...]
>> --
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>
>> I have often repented speaking, but never of holding my tongue.
>> -- Xenocrates
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avutil/lls: speed up performance of solve_lls

Reply via email to