Re: [PATCH] limit diff effort to fix performance issue

Nathan Hartman Wed, 02 Jun 2021 08:25:42 -0700

On Wed, Jun 2, 2021 at 8:20 AM Stefan Sperling <s...@elego.de> wrote:


> I've heard of several instances where users complain that 'svn update'
> takes an extraordinary amount of time to run (in terms of hours).


Ouch!

P vs NP situation? In general I think it is reasonable to limit effort
in order to generate acceptable results (in this case diffs that work)
in a reasonable time (seconds).

Thoughts below:

(snip)

As a reference, Git manages to produce a diff in about 13 seconds:
>
> $ time git diff 01.arxml 02.arxml > /dev/null
>     0m13.39s real     0m13.04s user     0m00.36s system


(snip)

The patch below implements an effort limit for Subversion's LCS diff.
> Diffing the offending XML files in a Subversion working copy becomes
> fast enough to be usable:
>
> $ time svn diff > /dev/null
>     0m06.82s real     0m05.40s user     0m01.41s system
>
> The resulting diff doesn't look as pretty as Git's diff output, but I don't
> think this will matter in practice.


One observation is that Git appears to try for twice as long in this
case, so perhaps doubling our effort limit from 1024 to 2048 will give
(some) parity between the two VCSs' prettiness and performance here?
(Of course, this case is just one data point...)

Another thought is to limit based on time rather than iterations, or
based on (time x file size). But for files that hit the limit, that
would cause non-deterministic output (inconsistent from one run to
another), which is kind of A Bad Thing. Perhaps (iterations x file
size) would be better, as that would be deterministic.

My only concern is that some corner-case diffs that are currently readable
> could become less readable. If such cases are found, we could easily
> address
> the problem by bumping the limit. Note: Any such diffs would still be
> *valid*.


(snip)

Before anyone asks, without evidence that it is really needed I don't think
> adding a config option to set this limit is warranted. Other
> implementations
> are also using hard-coded limits. We could add an environment variable to
> allow for experimentation, perhaps. In any case, I would prefer to make
> such tweaks in follow-up patches and keep this initial fix simple so it
> can be backported easily.


Agreed that it doesn't justify a new config.

I think a config doesn't make sense anyway: If the user wants to
control diff quality, a command line argument or API parameter would
allow us to keep performance snappy by default but provide for a
deep(er) think when the user specifically asks for it. (How would it
be specified? A time limit in seconds?) But, agreed that anything
like that, if done, should be a totally separate patch.

Thanks for the detailed write-up and patch.

I'll run some experiments...

Cheers,
Nathan

Re: [PATCH] limit diff effort to fix performance issue

Reply via email to