Simon Law wrote:
In Oracle9i our next Database Release shipping this summer, we have introduced
support for two new Unicode character sets. ...
New character *sets* ???
Carl
-Original Message-From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Simon
LawSent: Wednesday, May 30, 2001 11:02 AMTo:
[EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8
signature in web and email)Hi Folks,
Over the last few days, this ema
If you have this funny encoding please don't call it UTF8 because it is not
UTF8 and will only confuse users. You could call it OTF8 or something like
that but not UTF8.
How about WTF-8?
Sorry - I couldn't resist.
/|/|ike
From: Carl W. Brown [mailto:[EMAIL PROTECTED]]
I resisted calling it FTF-8 (Funky Transfer Format - 8), but
if you want to
call it Weird Transfer Format - 8, I don't have any real objections.
Well, that's ONE possible translation of WTF...
/|/|ike
: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
Carl,
> Ken,
>
> UTF-8s is essentially a way to ignore surrogate processing.
It allows a
> company to encode UTF-16 with UCS-2 logic.
>
> The problem i
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
According to the proposal, UTF-8S and UTF-32S would not have the same
status: they wouldn't be for interchange; they'd just be for
representation
internal to a given system, like UTF-EBCDIC (which, I think I
heard, has
not actually
system that sort like
UTF-16 is folly.
Carl
-Original Message-From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Simon
LawSent: Wednesday, May 30, 2001 11:02 AMTo:
[EMAIL PROTECTED]Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8
signature in web and email)Hi
Simon,
Would you care to answer (officially) why exactly Oracle needs for anything
to be done here? Per the spec, it is not illegal for a process to interpret
5/6-byte supplementary characters; it is only illegal to emit them. It seems
that Oracle and everyone else is well covered with the
someone emits the b
michka
- Original Message -
From: Simon Law [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, May 30, 2001 11:01 AM
Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)
Hi Folks,
Over the last few days, this email thread has generated many
On 05/27/2001 08:03:37 PM Jianping Yang wrote:
But it seems to me that we've lived without
Premise B in the past, and that it won't benefit us to adopt it now. Why
bother with it? Why not continue doing what we already know how to do?
As a matter of fact, the surrogate or supplementary
Doug wrote:
UTF-8 and UTF-32 should absolutely not be similarly hacked to maintain some
sort of bizarre compatibility with the binary sorting order of UTF-16.
UTC should not, and almost certainly will not, endorse such a proposal on the
part of the database vendors.
I would be loath
Antoine Leca wrote:
Jianping Yang wrote:
As a matter of fact, the surrogate or supplementary character was not defined
in the past,
How long is the past? I remember reading about these surrogates the first
time I put my hands on a draft copy of ISO 10646. It was nearly six years ago.
.
Carl
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Kenneth Whistler
Sent: Tuesday, May 29, 2001 11:18 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
Doug
Carl,
Ken,
UTF-8s is essentially a way to ignore surrogate processing. It allows a
company to encode UTF-16 with UCS-2 logic.
The problem is that by not implementing surrogate support you can introduce
subtle errors. For example it is common to break buffers apart into
segments.
, 2001 3:47 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
Carl,
Ken,
UTF-8s is essentially a way to ignore surrogate processing. It allows a
company to encode UTF-16 with UCS-2 logic.
The problem
"Carl W. Brown" [EMAIL PROTECTED];
$B08@h(B: [EMAIL PROTECTED];
Cc:
$BF|;~(B: 01/05/30 0:46
$B7oL>(B: RE: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)
Ken,
I suspect that Oracle is specifically pushing for this standard because of
its unique data base design
$B!z$8$e$&$$$C$A$c$s!z(B
EKYWY TXLY NPZ P MPVD XPHYV LPWWQY
NKT ZPN XT WYPZTX PE PMM ET HPWWD
"EYX EKTSZPXV'Z HTWY GSX
P XSHOYW EKPX TXY
PXV LTHHQEHYXE, ET HY, QZ RSQEY ZLPWD"
There was another abomination proposed. Oracle rather than adding UTF-16
support proposed that non plane 0
Jianping Yang wrote:
As a matter of fact, the surrogate or supplementary character was not defined
in the past,
How long is the past? I remember reading about these surrogates the first
time I put my hands on a draft copy of ISO 10646. It was nearly six years ago.
Or do you mean that it was
In a message dated 2001-05-26 16:00:47 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
The issue is this: Unicode's three encoding forms don't sort in the same
way when sorting is done using that most basic and
valid-in-almost-no-locales-but-easy-and-quick approach of simply comparing
: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of [EMAIL PROTECTED]
Sent: Monday, May 28, 2001 3:30 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
In a message dated 2001-05-26 16:00:47 Pacific Daylight Time,
[EMAIL
From: Jianping Yang [EMAIL PROTECTED]
As a matter of fact, the surrogate or supplementary
character was not defined in the past, so we could
live without Premise B in the past. But now the
supplementary character is defined and will soon be
supported, we have to bother with it.
Poor
I don't want to argue on this lengthy email, but only point two facts:
According to the proposal, UTF-8S and UTF-32S would not have the same
status: they wouldn't be for interchange; they'd just be for representation
internal to a given system, like UTF-EBCDIC (which, I think I heard, has
not
If you think something abominable is happening, please raise a loud voice
and flood UTC members with e-mail and tell everyone what you think and why
you think it. Nobody can hear you when you mumble.
And it helps if you have solid technical and philosophical arguments to
convey.
Well, I
$B!z$8$e$&$$$C$A$c$s!z(B
Encoding-aware program that "understand" Unicode, should treat U+FEFF
according to its literal meaning: "a non-breaking space having zero width".
I take it that U+FEFF is the Cheshire Cat's favorite character. What about that CLOSED
OPEN E, also? I got quite a
Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4
octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)?
Is Unicode UTF-8 diverging from ISO by the way in which a scalar value is
encoded in UTF-8? Should folks be concerned that the IETF RFC-2279 and
RFC-2781
On 05/25/2001 02:13:36 AM Bill Kurmey wrote:
Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4
octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)?
The distinction between the Unicode and ISO versions of UTF-8 is pretty
irrelevant. ISO UTF-8 allows a
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of [EMAIL PROTECTED]
Sent: Friday, May 25, 2001 8:29 AM
To: [EMAIL PROTECTED]
Subject: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)
On 05/25/2001 02:13:36 AM Bill Kurmey wrote:
Are there not 2
Unicode UTF-8 (was RE: UTF-8 signature in web and
email)
On 05/25/2001 12:21:13 PM Carl W. Brown wrote:
Peter,
There was another abomination proposed.
I was choosing not to mention the abominable.
- Peter
---
Peter
Some people said things like...
There was another abomination proposed.
I was choosing not to mention the abominable.
The abominable steam-rollers of history squish those who don't scream and
run; and the few weak survivors are forever cleaning up the resulting
messes.
If you think
At 11:35 AM 05/23/2001 +0200, Marco Cimarosti wrote:
David Starner wrote:
You're asking for every program to treat UTF-8 specially.
No I am not! I have been saying the exact opposite!
[...]
[...]
of now, UTF-8 is just one of many charsets in use on Unix.
In fact! So why do Unixers worry
David Starner wrote:
of now, UTF-8 is just one of many charsets in use on Unix.
In fact! So why do Unixers worry about bytes 0xEF, 0xBB,
0xBF [...]
Because if 0xA0 or 0xA1 0xA1 (or 0x20) show at the start of a script,
it's wrong. [...]
OK. I had written a reply to all your points but
David Starner wrote:
You're asking for every program to treat UTF-8 specially.
No I am not! I have been saying the exact opposite!
ZWNBSP in just one more multibyte character and UTF-8 is just one more
multibyte encoding. Why should this case be so special?
[...]
of now, UTF-8 is just one
John Cowan wrote:
Well, C-like language is a hedge. IIRC, C99 thinks
everything above U+007F is a letter.
OK, it was a hedge. I just wanted a scenario of plain text usage familiar to
programmers, and where visualization was not the main thing.
You can chose another example of your choice.
John Cowan wrote:
Well, C-like language is a hedge. IIRC, C99 thinks
everything above U+007F is a letter.
OK, it was a hedge. I just wanted a scenario of plain text usage familiar to
programmers, and where visualization was not the main thing.
You can chose another example of your choice.
In a message dated 2001-05-18 13:25:06 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
Last year, as previously the year before, we discussed the
possibility of defining some standard Unicode plain text formats. The
discussions foundered on the differences between text files meant for
David Starner wrote:
[...] At the fundamental heart of a Unix system is
passing arbitrary byte streams in highly flexible
ways. If every file starts with a signature then
that makes that significantly more complex. [...]
You forget one fundamental thing about U+FEFF: it is not (only) a byte
At 11:14 AM 05/22/2001 +0200, you wrote:
But, also in this case, why should it be a problem to have ZWNBSP in
whatever position in a file? Why should *this* character be more a problem
that SPACE, or TAB, or CARRIAGE RETURN, or COMMA, or name it?
Because SPACE, TAB, CARRIAGE RETURN, or COMMA
On 23 May 2001, Juliusz Chroboczek wrote:
Heck, MS-DOS doesn't even have the concept of concatenating plain
files!
I'm sorry I don't get you. There is the DOS command COPY A+B C for that,
with /A and /B switches for ASCII and binary files, and I have used
that for years. What do you mean by
In a message dated 2001-05-18 0:50:13 Pacific Daylight Time, [EMAIL PROTECTED]
writes:
People using this heuristic, who didn't really think it would
work that well after the talk, have confirmed later that it
actually works extremely well (and they were writing production
code, not just
At 22:58 01/05/17 -0400, [EMAIL PROTECTED] wrote:
Martin D$BS(Bst wrote:
There is about 5% of a justification
for having a 'signature' on a plain-text, standalone file (the reason
being that it's somewhat easier to detect that the file is UTF-8 from the
signature than to read through
At 10:58 PM -0400 5/17/01, [EMAIL PROTECTED] wrote:
The UTF-8 signature discussion appears every few months on this list,
usually as a religious debate between those who believe in it and those who
do not. Be forewarned, my religion may not match yours. :-)
My religion suggests that we find
michka
the only book on internationalization in VB at
http://www.i18nWithVB.com/
- Original Message -
From: Edward Cherlin [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, May 18, 2001 1:08 PM
Subject: Re: UTF-8 signature in web and email
At 10:58 PM -0400 5/17/01, [EMAIL
From: Edward Cherlin [EMAIL PROTECTED]
A text file with a BOM is, if not rich text, at least above the poverty
line.
(modified from Ed's prior msg -- this one is a keeper!)
michka
The UTF-8 signature discussion appears every few months on this list,
usually as a religious debate between those who believe in it and those who
do not. Be forewarned, my religion may not match yours. :-)
Keld Jørn Simonsen wrote:
For UTF-8 there is no need to have a BOM, as there is only
Hello Roozbeh
At 04:02 01/05/15 +0430, Roozbeh Pournader wrote:
Well, I received a UTF-8 email from Microsoft's Dr International today. It
was a multipart/alternative, with both the text/plain and text/html
in UTF-8. Well, nothing interesting yet, but the interesting point was
that the HTML
Keld Jørn Simonsen wrote:
For UTF-8 there is no need to have a BOM, as there is only one
way of serializing octets in UTF-8. There is no little-endian
or big-endian. A BOM is superfluous and will be ignored.
Not so. In plain text, it is a useful signature to distinguish UTF-8 from
other
: Martin Duerst [EMAIL PROTECTED]
To: Roozbeh Pournader [EMAIL PROTECTED]; Unicode List
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, May 15, 2001 6:55 PM
Subject: Re: UTF-8 signature in web and email
Hello Roozbeh
At 04:02 01/05/15 +0430, Roozbeh Pournader wrote:
Well, I received a UTF-8
PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Wednesday, May 16, 2001 00:57
Subject: Re: UTF-8 signature in web and email
For UTF-8 there is no need to have a BOM, as there is only one
way of serializing octets in UTF-8. There is no little-endian
or big-endian. A BOM is superfluous
Delurking for a moment for a few points of clarification please.
What is the definition of 'signature'? Does 'signature' in this thread's
context, include the XML 4-byte declarations (charset.html#h-5.2.1) without
the BOM as defined in this section?
Are you folks advocating that the BOM is
On Tue, 15 May 2001, Richard, Francois M wrote:
UTF-8 is considered as a character encoding form as any other...
For UTF-16 only, the BOM is recommended.
See http://www.w3.org/TR/REC-html40/charset.html#h-5.2.1
So BOM for UTF-8 HTML is neither recommended nor discouraged? Does anyone
agree
This mail, addressed to [EMAIL PROTECTED], was, presumably, intended
for [EMAIL PROTECTED].
Misha
On 15/05/2001 00:32:24 Roozbeh Pournader wrote:
Well, I received a UTF-8 email from Microsoft's Dr International today. It
was a multipart/alternative, with both the text/plain and text/html
51 matches
Mail list logo