Re: [fossil-users] Side-by-side diff and non-English text

Martijn Coppoolse Thu, 28 Feb 2013 01:25:09 -0800

On 28-2-2013 9:33, Sergei Gavrikov wrote:

For example


   
http://chiselapp.com/user/sg/repository/pangrams/fdiff?v1=edab872a806e8d4c&v2=6936fca46ff9d180

   Left-side hunk: 30
   Right-side hunks: 29, 81, 126, 137

Of course, unified diff has no such quirks.

It looks like the side-by-side diff algorithm isn't UTF-8-aware, andlooks at the text byte-for-byte. Consequently, it may detect adifference in the second byte of a multi-byte character, and startmarking a difference right in the middle of that character. The resultis an invalid single-byte character, followed by an HTML tag, followedby another (possibly also invalid) single-byte character. The same couldhappen at the end of a different sequence, of course, if the first byteis different but the second identical.

If fossil knows that a text is UTF-8-encoded, the diff algorithm shouldideally compare characters (which may span multiple bytes), and not bytes.

Adding a setting indicating a default diff would perhaps be easier, inthe short term. :-)

--
Martijn Coppoolse
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Side-by-side diff and non-English text

Reply via email to