Albert Reiner <[EMAIL PROTECTED]> writes:

>> For LaTeX etc, line breaks are fairly arbitrary - ie. paragraphs get
>> rebroken all the time. Wouldn't it make sense to use the same old
>> diff, but over words (whitespace-separated tokens) instead of lines?

> - Tokens are not always delimited by whitespace; e.g., in noweb you
>   might write <<foo bar>>_baz, of which "<<foo bar>>" should probably
>   be counted as a token for some operations, and "<<foo bar>>_baz" for
>   others.

So it is not a complete end-all solution.  However, it would localize
changes to smaller pieces, and thus be easier to commute patches across.

> - In quite a few instances (such as, e.g., the typical mmm-mode emacs
>   files, or literate programming source files) the notion of a token
>   must be different in different parts of the file.

Must?  I'm not sure what mmm-mode is (my Xemacs doesn't seem to have
it), but I don't understand why there would be any absolute
requirements. 

> - Whitespace (or indentation) is obviously significant in some
>   languages like, e.g., Haskell.  How would token diffing distinguish
>   between things that differ only in indentation?

By keeping track of indentation.  Just like current diff deals with
blank lines.

> - as with the (imho) simpler idea of allowing formatting filters

It doesn't seem any simpler to me.  My belief (perhaps unfounded and
possibly wrong?) is that standard diff does something like:

   minimum_edit_distance (lines x) (lines y)

My suggestion is to use:

   minimum_edit_distance (words x) (words y)

except that words must retain "empty" words as well, i.e. 
words "foo  bar" should be ["foo","","bar"], or perhaps ["foo ","
","bar"] 

I don't claim to understand patch commutation, so it is possible -
perhaps even likely - that this isn't going to reduce conflicts or
make it easier to commute across e.g. reformatting, of course.
But it seems to me that if darcs is commuting patches that don't touch
the same lines, this should extend to patches not touching the same
words. 

The obvious downside is the cost of diff, the standard algorithm is
O(n²) using dynamic programming, which, at ten words per line would
make it two orders of magnitude more costly.  I seem to remember diff
tools cheating a bit, though.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants


_______________________________________________
darcs-users mailing list
[email protected]
http://www.abridgegame.org/mailman/listinfo/darcs-users

Reply via email to