On Wed, Jun 2, 2021 at 8:20 AM Stefan Sperling <s...@elego.de> wrote:
> I've heard of several instances where users complain that 'svn update' > takes an extraordinary amount of time to run (in terms of hours). Ouch! P vs NP situation? In general I think it is reasonable to limit effort in order to generate acceptable results (in this case diffs that work) in a reasonable time (seconds). Thoughts below: (snip) As a reference, Git manages to produce a diff in about 13 seconds: > > $ time git diff 01.arxml 02.arxml > /dev/null > 0m13.39s real 0m13.04s user 0m00.36s system (snip) The patch below implements an effort limit for Subversion's LCS diff. > Diffing the offending XML files in a Subversion working copy becomes > fast enough to be usable: > > $ time svn diff > /dev/null > 0m06.82s real 0m05.40s user 0m01.41s system > > The resulting diff doesn't look as pretty as Git's diff output, but I don't > think this will matter in practice. One observation is that Git appears to try for twice as long in this case, so perhaps doubling our effort limit from 1024 to 2048 will give (some) parity between the two VCSs' prettiness and performance here? (Of course, this case is just one data point...) Another thought is to limit based on time rather than iterations, or based on (time x file size). But for files that hit the limit, that would cause non-deterministic output (inconsistent from one run to another), which is kind of A Bad Thing. Perhaps (iterations x file size) would be better, as that would be deterministic. My only concern is that some corner-case diffs that are currently readable > could become less readable. If such cases are found, we could easily > address > the problem by bumping the limit. Note: Any such diffs would still be > *valid*. (snip) Before anyone asks, without evidence that it is really needed I don't think > adding a config option to set this limit is warranted. Other > implementations > are also using hard-coded limits. We could add an environment variable to > allow for experimentation, perhaps. In any case, I would prefer to make > such tweaks in follow-up patches and keep this initial fix simple so it > can be backported easily. Agreed that it doesn't justify a new config. I think a config doesn't make sense anyway: If the user wants to control diff quality, a command line argument or API parameter would allow us to keep performance snappy by default but provide for a deep(er) think when the user specifically asks for it. (How would it be specified? A time limit in seconds?) But, agreed that anything like that, if done, should be a totally separate patch. Thanks for the detailed write-up and patch. I'll run some experiments... Cheers, Nathan