RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-06-07 Thread Christopher John Fynn
Simon Law wrote: In Oracle9i our next Database Release shipping this summer, we have introduced support for two new Unicode character sets. ... New character *sets* ???

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-31 Thread Carl W. Brown
Carl -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Simon LawSent: Wednesday, May 30, 2001 11:02 AMTo: [EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)Hi Folks, Over the last few days, this ema

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-31 Thread Ayers, Mike
If you have this funny encoding please don't call it UTF8 because it is not UTF8 and will only confuse users. You could call it OTF8 or something like that but not UTF8. How about WTF-8? Sorry - I couldn't resist. /|/|ike

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-31 Thread Ayers, Mike
From: Carl W. Brown [mailto:[EMAIL PROTECTED]] I resisted calling it FTF-8 (Funky Transfer Format - 8), but if you want to call it Weird Transfer Format - 8, I don't have any real objections. Well, that's ONE possible translation of WTF... /|/|ike

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Simon Law
: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) Carl, > Ken, > > UTF-8s is essentially a way to ignore surrogate processing. It allows a > company to encode UTF-16 with UCS-2 logic. > > The problem i

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Ayers, Mike
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] According to the proposal, UTF-8S and UTF-32S would not have the same status: they wouldn't be for interchange; they'd just be for representation internal to a given system, like UTF-EBCDIC (which, I think I heard, has not actually

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Carl W. Brown
system that sort like UTF-16 is folly. Carl -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Simon LawSent: Wednesday, May 30, 2001 11:02 AMTo: [EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)Hi

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Michael \(michka\) Kaplan
Simon, Would you care to answer (officially) why exactly Oracle needs for anything to be done here? Per the spec, it is not illegal for a process to interpret 5/6-byte supplementary characters; it is only illegal to emit them. It seems that Oracle and everyone else is well covered with the

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-30 Thread Michael \(michka\) Kaplan
someone emits the b michka - Original Message - From: Simon Law [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, May 30, 2001 11:01 AM Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) Hi Folks, Over the last few days, this email thread has generated many

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Peter_Constable
On 05/27/2001 08:03:37 PM Jianping Yang wrote: But it seems to me that we've lived without Premise B in the past, and that it won't benefit us to adopt it now. Why bother with it? Why not continue doing what we already know how to do? As a matter of fact, the surrogate or supplementary

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Kenneth Whistler
Doug wrote: UTF-8 and UTF-32 should absolutely not be similarly hacked to maintain some sort of bizarre compatibility with the binary sorting order of UTF-16. UTC should not, and almost certainly will not, endorse such a proposal on the part of the database vendors. I would be loath

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Jianping Yang
Antoine Leca wrote: Jianping Yang wrote: As a matter of fact, the surrogate or supplementary character was not defined in the past, How long is the past? I remember reading about these surrogates the first time I put my hands on a draft copy of ISO 10646. It was nearly six years ago.

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Carl W. Brown
. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Kenneth Whistler Sent: Tuesday, May 29, 2001 11:18 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) Doug

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Kenneth Whistler
Carl, Ken, UTF-8s is essentially a way to ignore surrogate processing. It allows a company to encode UTF-16 with UCS-2 logic. The problem is that by not implementing surrogate support you can introduce subtle errors. For example it is common to break buffers apart into segments.

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Carl W. Brown
, 2001 3:47 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) Carl, Ken, UTF-8s is essentially a way to ignore surrogate processing. It allows a company to encode UTF-16 with UCS-2 logic. The problem

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread $B$F$s$I$&$j$e$&$8(B
"Carl W. Brown" [EMAIL PROTECTED]; $B08@h(B: [EMAIL PROTECTED]; Cc: $BF|;~(B: 01/05/30 0:46 $B7oL>(B: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) Ken, I suspect that Oracle is specifically pushing for this standard because of its unique data base design

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread $B$F$s$I$&$j$e$&$8(B
$B!z$8$e$&$$$C$A$c$s!z(B EKYWY TXLY NPZ P MPVD XPHYV LPWWQY NKT ZPN XT WYPZTX PE PMM ET HPWWD "EYX EKTSZPXV'Z HTWY GSX P XSHOYW EKPX TXY PXV LTHHQEHYXE, ET HY, QZ RSQEY ZLPWD" There was another abomination proposed. Oracle rather than adding UTF-16 support proposed that non plane 0

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread Antoine Leca
Jianping Yang wrote: As a matter of fact, the surrogate or supplementary character was not defined in the past, How long is the past? I remember reading about these surrogates the first time I put my hands on a draft copy of ISO 10646. It was nearly six years ago. Or do you mean that it was

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread DougEwell2
In a message dated 2001-05-26 16:00:47 Pacific Daylight Time, [EMAIL PROTECTED] writes: The issue is this: Unicode's three encoding forms don't sort in the same way when sorting is done using that most basic and valid-in-almost-no-locales-but-easy-and-quick approach of simply comparing

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread Carl W. Brown
: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED] Sent: Monday, May 28, 2001 3:30 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) In a message dated 2001-05-26 16:00:47 Pacific Daylight Time, [EMAIL

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-28 Thread Michael \(michka\) Kaplan
From: Jianping Yang [EMAIL PROTECTED] As a matter of fact, the surrogate or supplementary character was not defined in the past, so we could live without Premise B in the past. But now the supplementary character is defined and will soon be supported, we have to bother with it. Poor

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-27 Thread Jianping Yang
I don't want to argue on this lengthy email, but only point two facts: According to the proposal, UTF-8S and UTF-32S would not have the same status: they wouldn't be for interchange; they'd just be for representation internal to a given system, like UTF-EBCDIC (which, I think I heard, has not

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-26 Thread Peter_Constable
If you think something abominable is happening, please raise a loud voice and flood UTC members with e-mail and tell everyone what you think and why you think it. Nobody can hear you when you mumble. And it helps if you have solid technical and philosophical arguments to convey. Well, I

RE: UTF-8 signature in web and email

2001-05-25 Thread $B$F$s$I$&$j$e$&$8(B
$B!z$8$e$&$$$C$A$c$s!z(B Encoding-aware program that "understand" Unicode, should treat U+FEFF according to its literal meaning: "a non-breaking space having zero width". I take it that U+FEFF is the Cheshire Cat's favorite character. What about that CLOSED OPEN E, also? I got quite a

RE: UTF-8 signature in web and email

2001-05-25 Thread Bill Kurmey
Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4 octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)? Is Unicode UTF-8 diverging from ISO by the way in which a scalar value is encoded in UTF-8? Should folks be concerned that the IETF RFC-2279 and RFC-2781

ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Peter_Constable
On 05/25/2001 02:13:36 AM Bill Kurmey wrote: Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4 octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)? The distinction between the Unicode and ISO versions of UTF-8 is pretty irrelevant. ISO UTF-8 allows a

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Carl W. Brown
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED] Sent: Friday, May 25, 2001 8:29 AM To: [EMAIL PROTECTED] Subject: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email) On 05/25/2001 02:13:36 AM Bill Kurmey wrote: Are there not 2

RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Carl W. Brown
Unicode UTF-8 (was RE: UTF-8 signature in web and email) On 05/25/2001 12:21:13 PM Carl W. Brown wrote: Peter, There was another abomination proposed. I was choosing not to mention the abominable. - Peter --- Peter

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-25 Thread Rick McGowan
Some people said things like... There was another abomination proposed. I was choosing not to mention the abominable. The abominable steam-rollers of history squish those who don't scream and run; and the few weak survivors are forever cleaning up the resulting messes. If you think

RE: UTF-8 signature in web and email

2001-05-24 Thread David Starner
At 11:35 AM 05/23/2001 +0200, Marco Cimarosti wrote: David Starner wrote: You're asking for every program to treat UTF-8 specially. No I am not! I have been saying the exact opposite! [...] [...] of now, UTF-8 is just one of many charsets in use on Unix. In fact! So why do Unixers worry

RE: UTF-8 signature in web and email

2001-05-24 Thread Marco Cimarosti
David Starner wrote: of now, UTF-8 is just one of many charsets in use on Unix. In fact! So why do Unixers worry about bytes 0xEF, 0xBB, 0xBF [...] Because if 0xA0 or 0xA1 0xA1 (or 0x20) show at the start of a script, it's wrong. [...] OK. I had written a reply to all your points but

RE: UTF-8 signature in web and email

2001-05-23 Thread Marco Cimarosti
David Starner wrote: You're asking for every program to treat UTF-8 specially. No I am not! I have been saying the exact opposite! ZWNBSP in just one more multibyte character and UTF-8 is just one more multibyte encoding. Why should this case be so special? [...] of now, UTF-8 is just one

RE: UTF-8 signature in web and email

2001-05-23 Thread Marco Cimarosti
John Cowan wrote: Well, C-like language is a hedge. IIRC, C99 thinks everything above U+007F is a letter. OK, it was a hedge. I just wanted a scenario of plain text usage familiar to programmers, and where visualization was not the main thing. You can chose another example of your choice.

RE: UTF-8 signature in web and email

2001-05-23 Thread Marco Cimarosti
John Cowan wrote: Well, C-like language is a hedge. IIRC, C99 thinks everything above U+007F is a letter. OK, it was a hedge. I just wanted a scenario of plain text usage familiar to programmers, and where visualization was not the main thing. You can chose another example of your choice.

Re: UTF-8 signature in web and email

2001-05-22 Thread DougEwell2
In a message dated 2001-05-18 13:25:06 Pacific Daylight Time, [EMAIL PROTECTED] writes: Last year, as previously the year before, we discussed the possibility of defining some standard Unicode plain text formats. The discussions foundered on the differences between text files meant for

RE: UTF-8 signature in web and email

2001-05-22 Thread Marco Cimarosti
David Starner wrote: [...] At the fundamental heart of a Unix system is passing arbitrary byte streams in highly flexible ways. If every file starts with a signature then that makes that significantly more complex. [...] You forget one fundamental thing about U+FEFF: it is not (only) a byte

RE: UTF-8 signature in web and email

2001-05-22 Thread David Starner
At 11:14 AM 05/22/2001 +0200, you wrote: But, also in this case, why should it be a problem to have ZWNBSP in whatever position in a file? Why should *this* character be more a problem that SPACE, or TAB, or CARRIAGE RETURN, or COMMA, or name it? Because SPACE, TAB, CARRIAGE RETURN, or COMMA

Re: UTF-8 signature in web and email

2001-05-22 Thread Roozbeh Pournader
On 23 May 2001, Juliusz Chroboczek wrote: Heck, MS-DOS doesn't even have the concept of concatenating plain files! I'm sorry I don't get you. There is the DOS command COPY A+B C for that, with /A and /B switches for ASCII and binary files, and I have used that for years. What do you mean by

Re: UTF-8 signature in web and email

2001-05-21 Thread DougEwell2
In a message dated 2001-05-18 0:50:13 Pacific Daylight Time, [EMAIL PROTECTED] writes: People using this heuristic, who didn't really think it would work that well after the talk, have confirmed later that it actually works extremely well (and they were writing production code, not just

Re: UTF-8 signature in web and email

2001-05-18 Thread Martin Duerst
At 22:58 01/05/17 -0400, [EMAIL PROTECTED] wrote: Martin D$B—S(Bst wrote: There is about 5% of a justification for having a 'signature' on a plain-text, standalone file (the reason being that it's somewhat easier to detect that the file is UTF-8 from the signature than to read through

Re: UTF-8 signature in web and email

2001-05-18 Thread Edward Cherlin
At 10:58 PM -0400 5/17/01, [EMAIL PROTECTED] wrote: The UTF-8 signature discussion appears every few months on this list, usually as a religious debate between those who believe in it and those who do not. Be forewarned, my religion may not match yours. :-) My religion suggests that we find

Re: UTF-8 signature in web and email

2001-05-18 Thread Michael \(michka\) Kaplan
michka the only book on internationalization in VB at http://www.i18nWithVB.com/ - Original Message - From: Edward Cherlin [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, May 18, 2001 1:08 PM Subject: Re: UTF-8 signature in web and email At 10:58 PM -0400 5/17/01, [EMAIL

Re: UTF-8 signature in web and email

2001-05-18 Thread Michael \(michka\) Kaplan
From: Edward Cherlin [EMAIL PROTECTED] A text file with a BOM is, if not rich text, at least above the poverty line. (modified from Ed's prior msg -- this one is a keeper!) michka

Re: UTF-8 signature in web and email

2001-05-17 Thread DougEwell2
The UTF-8 signature discussion appears every few months on this list, usually as a religious debate between those who believe in it and those who do not. Be forewarned, my religion may not match yours. :-) Keld Jørn Simonsen wrote: For UTF-8 there is no need to have a BOM, as there is only

Re: UTF-8 signature in web and email

2001-05-16 Thread Martin Duerst
Hello Roozbeh At 04:02 01/05/15 +0430, Roozbeh Pournader wrote: Well, I received a UTF-8 email from Microsoft's Dr International today. It was a multipart/alternative, with both the text/plain and text/html in UTF-8. Well, nothing interesting yet, but the interesting point was that the HTML

RE: UTF-8 signature in web and email

2001-05-16 Thread Marco Cimarosti
Keld Jørn Simonsen wrote: For UTF-8 there is no need to have a BOM, as there is only one way of serializing octets in UTF-8. There is no little-endian or big-endian. A BOM is superfluous and will be ignored. Not so. In plain text, it is a useful signature to distinguish UTF-8 from other

Re: UTF-8 signature in web and email

2001-05-16 Thread Michael \(michka\) Kaplan
: Martin Duerst [EMAIL PROTECTED] To: Roozbeh Pournader [EMAIL PROTECTED]; Unicode List [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, May 15, 2001 6:55 PM Subject: Re: UTF-8 signature in web and email Hello Roozbeh At 04:02 01/05/15 +0430, Roozbeh Pournader wrote: Well, I received a UTF-8

Re: UTF-8 signature in web and email

2001-05-16 Thread Mark Davis
PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, May 16, 2001 00:57 Subject: Re: UTF-8 signature in web and email For UTF-8 there is no need to have a BOM, as there is only one way of serializing octets in UTF-8. There is no little-endian or big-endian. A BOM is superfluous

Re: UTF-8 signature in web and email

2001-05-16 Thread Bill Kurmey
Delurking for a moment for a few points of clarification please. What is the definition of 'signature'? Does 'signature' in this thread's context, include the XML 4-byte declarations (charset.html#h-5.2.1) without the BOM as defined in this section? Are you folks advocating that the BOM is

RE: UTF-8 signature in web and email

2001-05-15 Thread Roozbeh Pournader
On Tue, 15 May 2001, Richard, Francois M wrote: UTF-8 is considered as a character encoding form as any other... For UTF-16 only, the BOM is recommended. See http://www.w3.org/TR/REC-html40/charset.html#h-5.2.1 So BOM for UTF-8 HTML is neither recommended nor discouraged? Does anyone agree

Re: UTF-8 signature in web and email

2001-05-15 Thread Misha Wolf
This mail, addressed to [EMAIL PROTECTED], was, presumably, intended for [EMAIL PROTECTED]. Misha On 15/05/2001 00:32:24 Roozbeh Pournader wrote: Well, I received a UTF-8 email from Microsoft's Dr International today. It was a multipart/alternative, with both the text/plain and text/html