[ 
https://issues.apache.org/jira/browse/SVN-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14920134#comment-14920134
 ] 

Ivan Zhakov edited comment on SVN-1966 at 10/17/15 3:25 PM:
------------------------------------------------------------

By default GNU diff uses heuristics to shortcut the minimal diff algorithm, 
when these kick in it means that the resulting diff is not know to be a minimal 
one. libsvn_diff doesn't have such heuristics and always computes a minimal 
diff; this looks likes one of those cases where it is very expensive to compute 
a minimal diff.  GNU diff's --minimal option disables the heuristics; use it on 
the test files and the GNU diff time goes through the roof.  This doesn't 
really explain the memory use, although as far as I recall GNU diff makes use 
of stack memory for some things that libsvn_diff does on the heap.



was (Author: philipm):
{noformat:nopanel=true}
By default GNU diff uses heuristics to shortcut the minimal diff algorithm, when
these kick in it means that the resulting diff is not know to be a minimal one.
libsvn_diff doesn't have such heuristics and always computes a minimal diff;
this looks likes one of those cases where it is very expensive to compute a
minimal diff.  GNU diff's --minimal option disables the heuristics; use it on
the test files and the GNU diff time goes through the roof.  This doesn't really
explain the memory use, although as far as I recall GNU diff makes use of stack
memory for some things that libsvn_diff does on the heap.
{noformat}


> libsvn_diff needs 'non-minimal-diff' mode.
> ------------------------------------------
>
>                 Key: SVN-1966
>                 URL: https://issues.apache.org/jira/browse/SVN-1966
>             Project: Subversion
>          Issue Type: Improvement
>          Components: libsvn_diff
>    Affects Versions: all
>            Reporter: Ben Collins-Sussman
>            Assignee: Sander Striker
>            Priority: Critical
>             Fix For: unscheduled
>
>         Attachments: 1_xmlfiles.tar.gz
>
>
> I've got two OpenOffice XML files here, each about 9 megs in size.  The 
> second file was produced by simply adding a "2" to the beginning of every 
> third line (...we used  a tiny python script to do the transform.)
> When I compare the two files using GNU diff, it takes 14 seconds.
> When I compare the two files using libsvn_diff, it never finishes... or 
> rather, I killed the process after 20 minutes when I noticed the memory 
> footprint had grown to 128megs and was still growing.  The footprint was only 
> ~20 megs for the
> first 10 minutes, which is expected behavior (about 2x the size of the file), 
> but I have no idea why the footprint started growing after that.
> I'm using the subversion/tests/libsvn_diff/diff-test binary, by the way.
> I'm attaching the two XML files to this mail for people to reproduce.
> Sander, any idea what's going on?  I know that libsvn_diff is sometimes a 
> *bit* slower than gdiff, but this scenario seems way out of whack.  Cmpilato 
> and I discovered this bug when a Collabnet customer attempted to run 'svn 
> merge':  the
> client started running svn_diff_diff3() on two 8 meg XML files similar to 
> this one (with a similar density of changed lines)... and after 20 minutes, 
> the server just timed-out and closed the socket... presumably because the 
> socket filled up with the REPORT response and the client stopped reading from 
> it.
> Anyway, I hope this is a simple bug in libsvn_diff...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to