[ On Monday, January 7, 2002 at 14:35:20 (-0500), Eric Siegerman wrote: ] > Subject: Re: conflict algorithm > > On Mon, Jan 07, 2002 at 06:48:20PM +0000, Duncan Sommerville wrote: > > Whats the definition of a conflict? > > > > In particular, what's the *scope* of its search - is it per line, per couple of >lines, etc? > > I don't believe that's formally defined; in practice, it's "per few lines".
Ah! A topic close to my heart! :-) The algorithm used by 'diff' (and thus 'diff3', thus implicitly defining the meaning of a "conflict" in CVS) has been formally defined many times and in several different ways! :-) This note from the footnotes of: <URL:http://www.people.fas.harvard.edu/~lib113/reference/unix/unix9.txt> Unix diff itself is by Doug McIlroy, and is based on an algorithm invented independently by Harold Stone and by Wayne Hunt and Tom Szymanski. (See "A fast algorithm for computing longest common subsequences," by J. W. Hunt and T. G. Szymanski, CACM, May, 1977.) The diff algorithm is described in M. D. McIlroy and J.W. Hunt, Technical Report 41, 1976. To quote McIlroy, "I had tried at least three completely different algorithms before the final one. diff is a quintessential case of not settling for mere competency in a program but revising until it was right." In the Bell Laboratories UNIX 7th Edition source we see this claim: * Uses an algorithm due to Harold Stone, which finds * a pair of longest identical subsequences in the two * files. and the history of the choice of algorithm can be found in here: <URL:http://attila.stevens-tech.edu/~lbernste/njcse.html> The Tale of "diff": a Caution for Developers of Reusable Modules. (with the permission of Doug McIlroy, inventor of diff) Once upon a time, there was a mathematical problem of finding the longest subsequence of lines common to two files. "No sweat," thought the developer. A dynamic programming technique that takes time `mn' and space `mn' to compare an `m'-line file to an `n'-line file would do the trick. But space `mn' was unacceptable on the small machines of yesteryear. "OK, we'll fly seat of the pants," thought our hero. So he read through both files until he found a line that disagreed, and then figured he would somehow search back and forth in both until he got back in sync. 'Somehow' was the killer. Suppose the second line in one file agreed with the fourth line ahead in the other and vice versa. How to choose? Then news came from afar in Princeton that the Wizard Hirschberger had seen a way to reduce space `mn' by a mathematical method to space `m', while only doubling the time. "Good deal!" thought our guy. "Now we can afford to run it. It was slow, but it did work and gave an explainable 'right' answer in a clearly defined way." But the people complained. When they moved a paragraph, it showed up as two changes, a deletion here and an addition there. So our hero made a "diff" that found moves. It was again seat of the pants, but it ran pretty well. Yet, sometimes, an evil occurred. If the people ran it on stuff where the same line occurred in many places, like assembly language or text processing, it discovered lots of deletions and additions that could be explained as moves. Our hero was filled with consternation. Then along came a shining knight, Harold Stone, with a dynamic programming technique that reduced the running time from the product to the sum of the file lengths, except in unnatural cases. Now here was something fast enough to use on big files, efficient in space and time, mathematically justifiable as giving a good answer, and experimentally shown to be physiologically useful. But then the people tinkered. Three times they altered output. They added features. They added stars! And the tinkering caused the code to increase and the manual to swell to half again its size. "Well," said our guy. "It is important to know when to stop." I don't know if the "Stone" algorithm was ever formally published though. People have since complained that the "Stone" algorithm is very accurate but not very fast and uses some fixed amount of memory for each difference found so more research was done. >From the GNU Diff manual: The basic algorithm is described in "An O(ND) Difference Algorithm and its Variations", Eugene W. Myers, `Algorithmica' Vol. 1 No. 2, 1986, pp. 251-266; and in "A File Comparison Program", Webb Miller and Eugene W. Myers, `Software--Practice and Experience' Vol. 15 No. 11, 1985, pp. 1025-1040. The algorithm was independently discovered as described in "Algorithms for Approximate String Matching", E. Ukkonen, `Information and Control' Vol. 64, 1985, pp. 100-118. More detail of the implementation currently used by CVS is provided in the GNU Diff manual in the section titled "What Comparison Means". Unfortunately the Texinfo manual is not included in the CVS distribution -- you'll have to obtain it separately in the GNU Diffutils distribution (which you'll have on most any modern *BSD or Linux box already, just type "info diff" and read all about it!). > As I understand it, the algorithm is basically: > diff -u common-ancestor new-version-1 > diff -u common-ancestor new-version-2 > > munge all the results together; call it a conflict if a difference > block from one "diff" overlaps a block from the other > > So basically it depends on the nature of the changes, and specifically on how > the diff algorithm chooses to report them. Actually it's a wee bit simpler than that (with the magic burried inside of the 'diff3' program): When CVS says: Merging differences between 1.X and 1.Y into somefile it does so using the command: diff3 -ATam somefile somefile:1.X somefile:1.Y (somefile:1.X and somefile:1.Y are temporary files created by checking out the explicitly specified revisions) Well, actually only my version of CVS uses '-AT'. The original CVS uses '-E' My version shows all the conflicting lines from all the three files, not just the latter two, and I find this almost infinitely easier to deal with when manually resolving the conflicts. <<<<<<< somefile changed line[s] unique to local copy of somefile changed line[s] unique to local copy of somefile changed line[s] unique to local copy of somefile ||||||| 1.X matching line[s] unique to version 1.X of somefile matching line[s] unique to version 1.X of somefile matching line[s] unique to version 1.X of somefile ======= matching line[s] unique to version 1.Y of somefile matching line[s] unique to version 1.Y of somefile matching line[s] unique to version 1.Y of somefile >>>>>>> 1.Y Here's an edited version of the manual page describing only just those options my version of CVS uses to do the merge: NAME diff3 - find differences between three files SYNOPSIS diff3 [options] mine older yours DESCRIPTION The diff3 command compares three files and outputs descriptions of their differences. The files to compare are mine, older, and yours. At most one of these three file names may be -, which tells diff3 to read the standard input for that file. Options -A Incorporate all changes from older to yours into mine, surrounding all conflicts with bracket lines. -T Output a tab rather than two spaces before the text of a line in normal format. This causes the align- ment of tabs in the line to look normal. -a Treat all files as text and compare them line-by- line, even if they do not appear to be text. -m Apply the edit script to the first file and send the result to standard output. Unlike piping the output from diff3 to ed, this works even for binary files and incomplete lines. -A is assumed if no edit script option is specified. If anyone's interested I have changes to the "sanity.sh" tests to accomodate my change to using "diff3 -A" and I can make them available upon request. Currently my version of sanity.sh is based on rev# 1.716 -- Greg A. Woods +1 416 218-0098; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Planix, Inc. <[EMAIL PROTECTED]>; VE3TCP; Secrets of the Weird <[EMAIL PROTECTED]> _______________________________________________ Info-cvs mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/info-cvs
