The same chapter makes a normative reference to ISO/IEC 2022 for C0 controls, it does not say that this concerns ISO/IEC 8859 (which does not reference itself ISO/IEC 2022 as being normative, but only informational just to day that it is compatible with it, as well as with ISO 6429, and a wide range of other international or national norms and various private standards, but not all of them : e.g. the VISCII national standard is not compatible with ISO/IEC 2022).
2012/11/17 Buck Golemon <b...@yelp.com> > > So don't say that there are one-for-one equivalences. > > I was just quoting this section of the standard: > http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf > > > There is a simple, one-to-one mapping between 7-bit (and 8-bit) control > codes and the Unicode control codes: every 7-bit (or 8-bit) control code is > numerically equal to its corresponding Unicode code point. > > A one-to-one equivalency between bytes and unicode-points is exactly what > is specified here, limited to the domain of "8-bit control codes". > > > On Fri, Nov 16, 2012 at 9:48 PM, Philippe Verdy <verd...@wanadoo.fr>wrote: > >> If you are thinking about "byte values" you are working at the encoding >> scheme level (in fact another lower level which defines a protocol >> presentation layer, e.g. "transport syntaxes" in MIME). Unicode codepoints >> are conceptually not an encoding scheme, just a coded character set >> (independant of the encoding scheme). >> >> Separate the levels of abstraction and you'll be much more fine. Forget >> the apparent homonymies that exist between distinct layers of abstraction >> and use each standard in what it is designed for (including the Unicode >> "character/glyph model" which is not defining an encoding scheme). >> >> So don't say that there are one-for-one equivalences. This is wrong : the >> adaptation layer must exist between abstraction levels and between separate >> standards, but the Unicode standard does not specify them completely (with >> the only exception of standard UTF encodings schemes, which is just one >> possible adaptation across some abstraction levels, but is not made to >> adapt alone to other standards than what is in the Unicode standard itself). >> >> >> >> 2012/11/17 Buck Golemon <b...@yelp.com> >> >>> On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewell <d...@ewellic.org> wrote: >>> >>>> Buck Golemon wrote: >>>> >>>> Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and >>>>> to map it to the equally-non-semantic U+81 ? >>>>> >>>>> This would allow systems that follow the html5 standard and use cp1252 >>>>> in place of latin1 to continue to be binary-faithful and reversible. >>>>> >>>> >>>> This isn't quite as black-and-white as the question about Latin-1. If >>>> you are targeting HTML5, you are probably safe in treating an incoming 0x81 >>>> (for example) as either U+0081 or U+FFFD, or throwing some kind of error. >>> >>> >>> Why do you make this conditional on targeting html5? >>> >>> To me, replacement and error is out because it means the system loses >>> data or completely fails where it used to succeed. >>> Currently there's no reasonable way for me to implement the U+0081 >>> option other than inventing a new "cp1252+latin1" codec, which seems >>> undesirable. >>> >>> >>>> HTML5 insists that you treat 8859-1 as if it were CP1252, so it no >>>> longer matters what the byte is in 8859-1. >>> >>> >>> I feel like you skipped a step. The byte is 0x81 full stop. I agree that >>> it doesn't matter how it's defined in latin1 (also it's not defined in >>> latin1). >>> The section of the unicode standard that says control codes are equal to >>> their unicode characters doesn't mention latin1. Should it? >>> I was under the impression that it meant any single-byte encoding, since >>> it goes out of its way to talk about "8-bit" control codes. >>> >> >> >