Just in case it helps, Ruby (since version 1.9) also uses 3).
Regards, Martin.
On 2012/11/17 6:48, Buck Golemon wrote:
When decoding bytes to unicode using the latin1 scheme, there are three
options for bytes not defined in the ISO-8859-1 standard.
1) Throw an error.
2) Insert the
On 2012/11/17 9:45, Doug Ewell wrote:
If he is targeting HTML5, then none of this matters, because HTML5 says
that ISO 8859-1 is really Windows-1252.
Yes. But unless Python wants to limit its use to HTML5, this should be
handled on a separate level (mapping a iso-8859-1 label to the
So don't say that there are one-for-one equivalences.
I was just quoting this section of the standard:
http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf
There is a simple, one-to-one mapping between 7-bit (and 8-bit) control
codes and the Unicode control codes: every 7-bit (or 8-bit)
Martin J. Dürst wrote:
If he is targeting HTML5, then none of this matters, because HTML5
says that ISO 8859-1 is really Windows-1252.
Yes. But unless Python wants to limit its use to HTML5, this should be
handled on a separate level (mapping a iso-8859-1 label to the
Windows-1252 decoder
IMO this isn't worth the effort being spent on it. MOST encodings have all
sorts of interesting quirks, variations, OEM or App specific behavior, etc.
These are a few code points that haven't really caused much confusion, and
other code pages are much more confusing (like the CJK ones in
5 matches
Mail list logo