On Mon, Sep 20, 2010 at 12:10 PM, Johan Corveleyn <jcor...@gmail.com> wrote: > On Mon, Sep 20, 2010 at 11:52 AM, Branko Čibej <br...@xbc.nu> wrote: >> On 15.09.2010 14:20, Johan Corveleyn wrote: >>> Some update on this: I have implemented this for svn_diff (excluding >>> the identical prefix and suffix of both files, and only then starting >>> to fill up the token tree and let the lcs-agorithm to its thing). It >>> makes a *huge* difference. On my bigfile.xml (1.5 Mb) with only one >>> line changed, the call to svn_diff_diff is ~10 times faster (15-20 ms >>> vs. 150-170 ms). >> >> >> Hmmm ... looks to me like test data tailored to the optimization. :) > > Nope, that's real data from a real repository, with a normal kind of > change that happens here every day. > > Of course this optimization is most effective if there are a lot of > common prefix/suffix lines. If there is a single change in the first > line, and a single change in the last one, this optimization will do > nothing but introduce a little bit of extra overhead. And it will > obviously make the most impact on large files (in fact it's just > relative to the ratio of the "number of common prefix/suffix lines" to > the "number of lines in between"). > > I'm sorry it takes me longer than expected to post a version of this > to the list, but I'm still having some problems with a couple of edge > conditions (I'm learning C as I go, and I'm struggling with a couple > of pointer calculations/comparisons). I plan to post something during > this week...
Johan, No need to apologize. Thanks for coming to the retreat at Hursley this past weekend; the discussion there really helped clarify some of the concepts around your patches. Keep up the good work! -Hyrum