Albert Reiner <[EMAIL PROTECTED]> writes: >> For LaTeX etc, line breaks are fairly arbitrary - ie. paragraphs get >> rebroken all the time. Wouldn't it make sense to use the same old >> diff, but over words (whitespace-separated tokens) instead of lines?
> - Tokens are not always delimited by whitespace; e.g., in noweb you > might write <<foo bar>>_baz, of which "<<foo bar>>" should probably > be counted as a token for some operations, and "<<foo bar>>_baz" for > others. So it is not a complete end-all solution. However, it would localize changes to smaller pieces, and thus be easier to commute patches across. > - In quite a few instances (such as, e.g., the typical mmm-mode emacs > files, or literate programming source files) the notion of a token > must be different in different parts of the file. Must? I'm not sure what mmm-mode is (my Xemacs doesn't seem to have it), but I don't understand why there would be any absolute requirements. > - Whitespace (or indentation) is obviously significant in some > languages like, e.g., Haskell. How would token diffing distinguish > between things that differ only in indentation? By keeping track of indentation. Just like current diff deals with blank lines. > - as with the (imho) simpler idea of allowing formatting filters It doesn't seem any simpler to me. My belief (perhaps unfounded and possibly wrong?) is that standard diff does something like: minimum_edit_distance (lines x) (lines y) My suggestion is to use: minimum_edit_distance (words x) (words y) except that words must retain "empty" words as well, i.e. words "foo bar" should be ["foo","","bar"], or perhaps ["foo "," ","bar"] I don't claim to understand patch commutation, so it is possible - perhaps even likely - that this isn't going to reduce conflicts or make it easier to commute across e.g. reformatting, of course. But it seems to me that if darcs is commuting patches that don't touch the same lines, this should extend to patches not touching the same words. The obvious downside is the cost of diff, the standard algorithm is O(n²) using dynamic programming, which, at ten words per line would make it two orders of magnitude more costly. I seem to remember diff tools cheating a bit, though. -k -- If I haven't seen further, it is by standing in the footprints of giants _______________________________________________ darcs-users mailing list [email protected] http://www.abridgegame.org/mailman/listinfo/darcs-users
