On Fri, Apr 1, 2022 at 5:19 PM Stefan Sperling <s...@elego.de> wrote:
>
> On Fri, Apr 01, 2022 at 05:04:49PM +0200, Johan Corveleyn wrote:
> > Yes, I suppose this is the case: Patience feeds different (smaller)
> > things to LCS. Because, as far as I understand, Myers' way of
> > calculating the LCS is fundamentally "somewhat" quadratic (according
> > to the Myers paper from 1986 [1], titled "An O(ND) Difference
> > Algorithm and Its Variations").
> >
> > That is: O(ND) where N = the sum of the lengths of both sides of the
> > diff and D = the size of the minimal diff. That's why eliminating the
> > common prefix and suffix (added as an optimization years ago) is
> > useful, it makes N much smaller.
> >
> > Now, in cases where both texts are huge (even after prefix-suffix
> > stripping), but the minimal diff is small, the algorithm should in
> > theory be fast (because D is small). Just not sure what is going wrong
> > then in our variation of this algorithm. Or have we implemented
> > another algorithm than the one described by Myers?
>
> SVN diff is allegedly "based on the approach described by ... Meyers (sic),
> [...]  but has been modified for better performance."
> I guess the modifications refer to prefix/suffix scanning,
> because this description dates from r1128862.

Hm, not the prefix/suffix scanning, as I implemented those (it was my
first big patch / work in SVN), and that's outside of the LCS
algorithm (it happens in a phase before assembling "tokens (=lines)"
and before handing it off to the LCS algorithm). It was integrated
into trunk in r1067800 (and some further fixes followed afterwards).
Basically, that work is done in
libsvn_diff/diff_file.c#find_identical_{prefix,suffix}.

I suppose the "modified for better performance" refers to some
optimisations done by Morten Kloster, who then later submitted the
patch adding this comment in r1128862. His optimisations were more
related to the LCS algorithm (further reducing the problem space, and
providing early exits in certain cases or some such).

The core LCS algorithm in libsvn_diff/lcs.c was from way before my
time, and according to 'svn blame' was mainly written by Sander
Striker. I don't understand it fully either :-), but I always assumed
it was basically some clever implementation of Myers' algorithm.


--
Johan

Reply via email to