On Tue, Nov 11, 2014 at 1:12 PM, Jan Nijtmans <jan.nijtm...@gmail.com>
wrote:

> The convention on Windows is to assume CP1252, unless the file
> starts with the UTF-8 BOM. That's exactly what fossil is doing here:
>      <http://fossil-scm.org/index.html/artifact/cbd7a598c8?ln=1745-1747>
> So, make sure that the file starts with the UTF-8 BOM, otherwise
> fossil cannot make any valid guess on what encoding is used.
> Assuming UTF-8 on windows is wrong, because if it is really CP1252
> then that leads to invalid utf-8 byte sequences.
>

i think those very lines are the fix. The poster was using 1.29 (from June,
2014), which is much newer than the last of those lines. The poster claims
that the text is in UTF-8. So it sounds to me like the fix for him is, "use
a BOM or CP1252" (but i assume CP1252 is not Chinese-capable). It seems to
me that Fossil is doing all that it can there (namely, following a
heuristic for determining the encoding, and no heuristic is infallible).

Regarding the BOM: the Unicode consortium recommends against using a BOM
because (A) it's senseless (for its original purpose) in UTF-8 and (B)
because so many tools don't deal well with it (i've seen PHP sites go
offline when someone checked a BOM into one of the source files). i
understand that it's probably the lesser of several evils here, though.

-- 
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________
fossil-dev mailing list
fossil-dev@lists.fossil-scm.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/fossil-dev

Reply via email to