On Tue, Nov 11, 2014 at 1:12 PM, Jan Nijtmans <jan.nijtm...@gmail.com> wrote:
> The convention on Windows is to assume CP1252, unless the file > starts with the UTF-8 BOM. That's exactly what fossil is doing here: > <http://fossil-scm.org/index.html/artifact/cbd7a598c8?ln=1745-1747> > So, make sure that the file starts with the UTF-8 BOM, otherwise > fossil cannot make any valid guess on what encoding is used. > Assuming UTF-8 on windows is wrong, because if it is really CP1252 > then that leads to invalid utf-8 byte sequences. > i think those very lines are the fix. The poster was using 1.29 (from June, 2014), which is much newer than the last of those lines. The poster claims that the text is in UTF-8. So it sounds to me like the fix for him is, "use a BOM or CP1252" (but i assume CP1252 is not Chinese-capable). It seems to me that Fossil is doing all that it can there (namely, following a heuristic for determining the encoding, and no heuristic is infallible). Regarding the BOM: the Unicode consortium recommends against using a BOM because (A) it's senseless (for its original purpose) in UTF-8 and (B) because so many tools don't deal well with it (i've seen PHP sites go offline when someone checked a BOM into one of the source files). i understand that it's probably the lesser of several evils here, though. -- ----- stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________ fossil-dev mailing list fossil-dev@lists.fossil-scm.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/fossil-dev