Hi Jeff,

It looks like the python difflib module is the bad guy here. It has a
heuristic to speed up degenerate cases. This can lead to unintuitive
diffs like you are seeing.

You can disable the heuristic by copying the standard module
difflib.py into the meld directory and making the following change at
line 316. It will slow meld down for some comparisons (100x for some
generated files)
-                if n >= 200 and len(indices) * 100 > n:
+                if 0 and n >= 200 and len(indices) * 100 > n:

I'm reasonably familiar with the string matching algorithms. I think
difflib could be modified without having to reinvent the wheel. For
instance, it usually makes sense to strip common suffixes and prefixes
from in-line differences, never mind the warning on line 393.

Stephen.

On 8/23/06, Jeff Smith <[EMAIL PROTECTED]> wrote:
> My text files are formatted with carriage returns at the end of paragraphs.
> This means that a single line of text could have several hundred words. Most
> other diff tools tell me that Old Paragraph is different from New Paragraph
> and stop right there - they don't tell me where the two lines differ. This
> is most vexing when the difference is a single space, or a single
> punctuation point. The thing I love about meld is that it highlights the
> actual words that are different, and doesn't just tell me that my 235 words
> of text differ at some unspecified point in the line.
>
> Of course, in my context, there are still some cases where the word-level
> highlighting of meld is not as robust as it could be, but it works
> surprisingly well - especially if its design didn't give much thought to
> this particular context of use.
>
> I'm including two screen captures to show you what I mean. In meld1.png, the
> two highlighted paragraphs differ by a single comma. Meld points it out to
> me, but fails to show me that the two lines are identical after the comma.
> meld2.png, on the other hand, DOES show me that the lines are identical
> after the single word of difference is accounted for. I realize that I'm
> likely a complete outlier in your user community, but I'd love to see an
> even more robust line differentiation algorithm that was better able to show
> such in-line differences.
_______________________________________________
meld-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/meld-list

Reply via email to