date:20121117

Re: latin1 decoder implementation

2012-11-17 Thread Martin J. Dürst

Just in case it helps, Ruby (since version 1.9) also uses 3). Regards, Martin. On 2012/11/17 6:48, Buck Golemon wrote: When decoding bytes to unicode using the latin1 scheme, there are three options for bytes not defined in the ISO-8859-1 standard. 1) Throw an error. 2) Insert the

Re: latin1 decoder implementation

2012-11-17 Thread Martin J. Dürst

On 2012/11/17 9:45, Doug Ewell wrote: If he is targeting HTML5, then none of this matters, because HTML5 says that ISO 8859-1 is really Windows-1252. Yes. But unless Python wants to limit its use to HTML5, this should be handled on a separate level (mapping a iso-8859-1 label to the

Re: cp1252 decoder implementation

2012-11-17 Thread Buck Golemon

So don't say that there are one-for-one equivalences. I was just quoting this section of the standard: http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf There is a simple, one-to-one mapping between 7-bit (and 8-bit) control codes and the Unicode control codes: every 7-bit (or 8-bit)

Re: latin1 decoder implementation

2012-11-17 Thread Doug Ewell

Martin J. Dürst wrote: If he is targeting HTML5, then none of this matters, because HTML5 says that ISO 8859-1 is really Windows-1252. Yes. But unless Python wants to limit its use to HTML5, this should be handled on a separate level (mapping a iso-8859-1 label to the Windows-1252 decoder

RE: cp1252 decoder implementation

2012-11-17 Thread Shawn Steele

IMO this isn't worth the effort being spent on it. MOST encodings have all sorts of interesting quirks, variations, OEM or App specific behavior, etc. These are a few code points that haven't really caused much confusion, and other code pages are much more confusing (like the CJK ones in

Re: latin1 decoder implementation

Re: latin1 decoder implementation

Re: cp1252 decoder implementation

Re: latin1 decoder implementation

RE: cp1252 decoder implementation

5 matches

Site Navigation

Mail list logo

Footer information