On 28-2-2013 9:33, Sergei Gavrikov wrote:
For example
http://chiselapp.com/user/sg/repository/pangrams/fdiff?v1=edab872a806e8d4c&v2=6936fca46ff9d180
Left-side hunk: 30
Right-side hunks: 29, 81, 126, 137
Of course, unified diff has no such quirks.
It looks like the side-by-side diff algorithm isn't UTF-8-aware, and
looks at the text byte-for-byte. Consequently, it may detect a
difference in the second byte of a multi-byte character, and start
marking a difference right in the middle of that character. The result
is an invalid single-byte character, followed by an HTML tag, followed
by another (possibly also invalid) single-byte character. The same could
happen at the end of a different sequence, of course, if the first byte
is different but the second identical.
If fossil knows that a text is UTF-8-encoded, the diff algorithm should
ideally compare characters (which may span multiple bytes), and not bytes.
Adding a setting indicating a default diff would perhaps be easier, in
the short term. :-)
--
Martijn Coppoolse
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users