On 2012/11/17 12:54, Buck Golemon wrote:
On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewelld...@ewellic.org wrote:
Buck Golemon wrote:
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
to map it to the equally-non-semantic U+81 ?
U+0081 (there are always at least four
No-one would be more happy than me if we could just ditch all the legacy
encodings and all switch to Unicode everywhere, but that will never happen.
There is enough legacy content out there that will never be converted.
That's sort of exactly the point:
*NEW* content should be UTF-8 (or
: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
Of Masatoshi Kimura
Sent: Wednesday, November 21, 2012 12:28 PM
To: unicode@unicode.org
Subject: Re: cp1252 decoder implementation
(2012/11/22 1:58), Shawn Steele wrote:
We aren’t going change names (since that’ll break anyone
Mailing List; Buck Golemon
*Subject:* RE: cp1252 decoder implementation
** **
Phillipe commented: “(even if later Microsoft decides to map some other
characters in its own windows-1252 charset, like it did several times and
notably when the Euro symbol was mapped)”.
** **
Personal
Buck Golemon wrote:
The status of these 5 characters is already in the best fit mappings
document pointed to by the IANA registry entry for windows-1252,
which is strong as I’m willing to go for them.
I don't understand the relation between bestfit1252 and cp1252. Could
you clarify it for me?
Den 2012-11-21 19:30:50 skrev Doug Ewell d...@ewellic.org:
My problem is with the double standard. In some people's minds, if IE
does it, it's called moronic or brain-dead.
If the software with the biggest market share does it, then everyone else
will have to follow it, no matter what you
Netscape 1.0 RC 1 is available here:
http://www.oldversion.com/Netscape.html
On 2012/11/21 16:23, Peter Krefting wrote:
Doug Ewell d...@ewellic.org:
Somewhat off-topic, I find it amusing that tolerance of poorly
encoded input is considered justification for changing the underlying
standards,
The encoding work at W3C, at least as far as I see it, is not an attempt
to
-bou...@unicode.org] On Behalf
Of Murray Sargent
Sent: Tuesday, November 20, 2012 8:55 PM
To: verd...@wanadoo.fr; Doug Ewell
Cc: Unicode Mailing List; Buck Golemon
Subject: RE: cp1252 decoder implementation
Phillipe commented: “(even if later Microsoft decides to map some other
characters in its own
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
But may be we could ask to Microsoft to map officially C1 controls on
the remaining holes of windows-1252, to help improve the
interoperability in HTML5 with a predictable and stable behavior
across HTML5 applications. In that case
Peter Krefting peter at opera dot com wrote:
Somewhat off-topic, I find it amusing that tolerance of poorly
encoded input is considered justification for changing the
underlying standards, when Internet Explorer has been flamed for
years and years for tolerating bad input.
It's called
May be you've forgotten FrontPage, a product acquired by Microsoft and then
developped by Microsoft and widely promoted as part of Office, that
insisted in declaring webpages as ISO 8859-1 even if they contained
characters that are only in windows-1252. Even if we edited the page
externally to
(2012/11/22 1:58), Shawn Steele wrote:
We aren’t going change names (since that’ll break
anyone already using them), we probably won’t recognize new names (since
anyone trying to use a new name wouldn’t work on millions of existing
computers, so no one would add it).
Hey, why Microsoft changed
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
May be you've forgotten FrontPage, a product acquired by Microsoft and
then developped by Microsoft and widely promoted as part of Office,
that insisted in declaring webpages as ISO 8859-1 even if they
contained characters that are
Buck Golemon buck at yelp dot com wrote:
What effort has been spent? This is not an either/or type of
proposition. If we can agree that it's an improvement (albeit small),
let's update the mapping.
Is it much harder than I believe it is?
ISO/IEC 8859-1 is, uh, an ISO/IEC standard. CP1252 is
To solve the situation, it would be smarter if the W3C was not referencing
the Microsoft standard itself but a standardized version of it, explaining
explicitly how to handle the unassigned code positions. The W3C coud
describe the expected mapping of these positions explicitly in its own
Phillipe commented: (even if later Microsoft decides to map some other
characters in its own windows-1252 charset, like it did several times and
notably when the Euro symbol was mapped).
Personal opinion, but I'd be very surprised if Microsoft ever changed the 1252
charset. The euro was added
But may be we could ask to Microsoft to map officially C1 controls on the
remaining holes of windows-1252, to help improve the interoperability in
HTML5 with a predictable and stable behavior across HTML5 applications. In
that case the W3C needs not doing anything else and there's no need to
Doug Ewell d...@ewellic.org:
Somewhat off-topic, I find it amusing that tolerance of poorly encoded
input is considered justification for changing the underlying standards,
when Internet Explorer has been flamed for years and years for
tolerating bad input.
It's called adapting to
Hi
On 21 November 2012 16:42, Philippe Verdy verd...@wanadoo.fr wrote:
But may be we could ask to Microsoft to map officially C1 controls on the
remaining holes of windows-1252, to help improve the interoperability in
HTML5 with a predictable and stable behavior across HTML5 applications. In
The same chapter makes a normative reference to ISO/IEC 2022 for C0
controls, it does not say that this concerns ISO/IEC 8859 (which does not
reference itself ISO/IEC 2022 as being normative, but only informational
just to day that it is compatible with it, as well as with ISO 6429, and a
wide
I find these to be true statements, but I don't see how they support or
refute that which came before.
On Sun, Nov 18, 2012 at 3:58 PM, Philippe Verdy verd...@wanadoo.fr wrote:
The same chapter makes a normative reference to ISO/IEC 2022 for C0
controls, it does not say that this concerns
On Sat, Nov 17, 2012 at 10:52 AM, Shawn Steele
shawn.ste...@microsoft.comwrote:
IMO this isn’t worth the effort being spent on it. MOST encodings have
all sorts of interesting quirks, variations, OEM or App specific behavior,
etc. These are a few code points that haven’t really caused much
What effort has been spent? This is not an either/or type of proposition.
If we can agree that it's an improvement (albeit small), let's update the
mapping.
Is it much harder than I believe it is?
What if some application's treating it as undefined? And now the code page
gets updated to
So don't say that there are one-for-one equivalences.
I was just quoting this section of the standard:
http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf
There is a simple, one-to-one mapping between 7-bit (and 8-bit) control
codes and the Unicode control codes: every 7-bit (or 8-bit)
: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf
Of Buck Golemon
Sent: Saturday, November 17, 2012 8:35 AM
To: verd...@wanadoo.fr
Cc: Doug Ewell; unicode
Subject: Re: cp1252 decoder implementation
So don't say that there are one-for-one equivalences.
I was just quoting
cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not.
This leaves five bytes with undefined semantics.
Currently the python cp1252 decoder allows us to ignore/replace/error on
these bytes, but there's no facility for allowing these unknown bytes to
round-trip through the
So I find that the unicode.org cp1252 file leaves those bytes undefined as
well, so the issue stems from there.
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and to
map it to the equally-non-semantic
Buck Golemon wrote:
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
to map it to the equally-non-semantic U+81 ?
This would allow systems that follow the html5 standard and use cp1252
in place of latin1 to continue to be binary-faithful and reversible.
This isn't quite
Zitat von Buck Golemon b...@yelp.com:
cp1252 (aka windows-1252) defines 27 characters which iso-8859-1 does not.
This leaves five bytes with undefined semantics.
Currently the python cp1252 decoder allows us to ignore/replace/error on
these bytes, but there's no facility for allowing these
Golemon; unicode
Subject: Re: cp1252 decoder implementation
Buck Golemon wrote:
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
to map it to the equally-non-semantic U+81 ?
This would allow systems that follow the html5 standard and use cp1252
in place of latin1
1)
I did this and was criticized for inventing my own frankensteined
encoding, although I believe it's conceptually consistent with the idea
that cp1252 is to be used as a superset of latin1.
It's true that what I wrote is not consistent with the unicode.orgdefinition:
On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewell d...@ewellic.org wrote:
Buck Golemon wrote:
Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and
to map it to the equally-non-semantic U+81 ?
This would allow systems that follow the html5 standard and use cp1252
in place of
Buck Golemon wrote:
This isn't quite as black-and-white as the question about Latin-1. If
you are targeting HTML5, you are probably safe in treating an
incoming 0x81 (for example) as either U+0081 or U+FFFD, or throwing
some kind of error.
Why do you make this conditional on targeting html5?
If you are thinking about byte values you are working at the encoding
scheme level (in fact another lower level which defines a protocol
presentation layer, e.g. transport syntaxes in MIME). Unicode codepoints
are conceptually not an encoding scheme, just a coded character set
(independant of the
35 matches
Mail list logo