On 28-2-2013 9:33, Sergei Gavrikov wrote:
For example

   
http://chiselapp.com/user/sg/repository/pangrams/fdiff?v1=edab872a806e8d4c&v2=6936fca46ff9d180

   Left-side hunk: 30
   Right-side hunks: 29, 81, 126, 137

Of course, unified diff has no such quirks.

It looks like the side-by-side diff algorithm isn't UTF-8-aware, and looks at the text byte-for-byte. Consequently, it may detect a difference in the second byte of a multi-byte character, and start marking a difference right in the middle of that character. The result is an invalid single-byte character, followed by an HTML tag, followed by another (possibly also invalid) single-byte character. The same could happen at the end of a different sequence, of course, if the first byte is different but the second identical.

If fossil knows that a text is UTF-8-encoded, the diff algorithm should ideally compare characters (which may span multiple bytes), and not bytes.

Adding a setting indicating a default diff would perhaps be easier, in the short term. :-)
--
Martijn Coppoolse
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to