Carl W. Brown wrote:
Jianping,
In fact, Oracle 8.0 development started in 1992 and it was
released in 1994,
which should be much earlier than NT 5.0.
Back then I was still using Oracle 7. Thank you for correcting me. What
made you chose UTF-8 back in 1992?
Oracle 7 supports
Carl W. Brown wrote:
I warned my clients not to use
surrogates with Oracle 8.x data bases. I also can not see that they could
be so short sighted not to develop a full UTF-8 encoder. If MS can put
surrogate support into Windows 2000, then they can put it into Oracle 8.0.
I am sure that
Carl W. Brown wrote:
If there are no surrogates in the database, is there any reason that I can
not change the database from UTF8 to AL32UTF8?
You can change the database from UTF8 to AL32UTF8 in this case. Also you can
use Oracle database scanner to scan your UTF8 database and if you
Mark Davis wrote:
You are correct about the published definitions. As I recall, though, we
were referring to UTF-FSS as UTF-8 in the UTC meetings before it was changed
to account for UTF-16.
In any event, I don't know whether Oracle was involved in those discussions
or not, or whether
Markus Scherer wrote:
This means that Oracle mis-implemented the UTF-8 standard as it was specified at
that time, starting at least with Unicode 2.0.
No, Oracle does not mis-implement the UTF-8 standard but only limit its support to BMP
only. Except the backward compatibility reason,
Lisa Moore wrote:
Jianping wrote:
only Oracle provides fully UTF-8 and
UTF-16 support for RDBMS
Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16. And DB2
for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's
intrepretation of fully
The fully
[EMAIL PROTECTED] wrote:
On 06/11/2001 10:45:46 PM Mark Davis wrote:
[earlier]
- Oracle could probably make a case for their name for UTF8 simply being
an
anachronism. After all, the original definition of UTF-8 did convert
surrogate pairs as they are doing in what they call UTF8.
[EMAIL PROTECTED] wrote:
On 06/12/2001 01:13:48 PM Jianping Yang wrote:
If you convert ED A0 80 ED B0 80 into UTF-16, what does it mean then?
I
think definitely it means U-0001.
I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*.
So UTF-8 is not compatible
One thing needs to clarify here is that there is no four byte encoding in
UTF-8S proposal and four byte encoding is illegal but not irregular. As
everything in UTF-8S is perfect match to UTF-16, any blame to this proposal
also applies to UTF-16 encoding form.
Regards,
Jianping.
Kenneth Whistler
Kenneth Whistler wrote:
Jianping wrote:
One thing needs to clarify here is that there is no four byte encoding in
UTF-8S proposal and four byte encoding is illegal but not irregular. As
everything in UTF-8S is perfect match to UTF-16, any blame to this proposal
also applies to UTF-16
Kenneth Whistler wrote:
Jianping responded:
Kenneth Whistler wrote:
Jianping wrote:
One thing needs to clarify here is that there is no four byte encoding in
UTF-8S proposal and four byte encoding is illegal but not irregular. As
everything in UTF-8S is perfect match to
Is this the language that should be used in a professional way? I wonder
how could this happen to the Unicode mail list!
Michael (michka) Kaplan wrote:
From: Rick McGowan [EMAIL PROTECTED]
... asking for a lavicious license to be lecherously lazy
Parse error at lavicious. No such word
Michael (michka) Kaplan wrote:
From: Jianping Yang [EMAIL PROTECTED]
If UTF-8S were to by some miracle be accepted by
the UTC, implementers will be put out and offended
for most of the next decade.
If it is, that is rule of law from UTC.
Very true.
devil's advocate
Ken,
From your analysis, it make me more believe that we need a UTF-8S not only for the
binary order but also for this ambiguity applying to both UTF-8S and UTF-16. As
proposed UTF-8S encoding is logically equivalent to the UTF-16, they share the same
property which is different from UTF-8 and
but will not work with standard UTF-8 encoders and
decoders.
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Jianping Yang
Sent: Thursday, June 07, 2001 6:51 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: UTF-8 syntax
I don't get point
semantics
is totally independent of its character set, which will enable you to build
portable NCHAR application that uses either AL16UTF16 or UTF8 for your benefit
when considering storage space for particular locale.
Regards,
Jianping.
Carl
-Original Message-
From: Jianping Yang
the same character by searching D800 DC00 in
UTF-16.
Unless this unpaired surrogate will be totally eliminated from UTF forms, this
issue could be hit.
Regards,
Jianping.
Ayers, Mike wrote:
From: Jianping Yang [mailto:[EMAIL PROTECTED]]
This will fix the following problem for example
Ken,
Thanks, your comment could close this argument against UTF-8S syntax as the attack
here is groundless now, because there is no need to encoding ED A0 80 and ED B0
80 as separate *paired* surrogates in UTF-8S and they will always be converted
into 0x1 in UTF-32 or F0 90 80 80 in UTF-8.
I don't get point from this argument as UTF-8S is exactly mapped to UTF-16 in
UTF-16 code unit which means one UTF-16 code unit will be mapped to either one,
two, or three bytes in UTF-8S. So if you are saying there is ambiguous in
UTF-8S, it should also apply to UTF-16, which does not make sense
Antoine Leca wrote:
Jianping Yang wrote:
As a matter of fact, the surrogate or supplementary character was not defined
in the past,
How long is the past? I remember reading about these surrogates the first
time I put my hands on a draft copy of ISO 10646. It was nearly six years ago
I don't want to argue on this lengthy email, but only point two facts:
According to the proposal, UTF-8S and UTF-32S would not have the same
status: they wouldn't be for interchange; they'd just be for representation
internal to a given system, like UTF-EBCDIC (which, I think I heard, has
not
Mr Zhang is CEO of that company.
Regards,
Jianping.
John Jenkins wrote:
On Monday, February 26, 2001, at 09:12 PM, Richard Cook wrote:
Is there any connection between this http://www.unihan.com.cn/ site and
IRG? What is UniHan Digital Tech Co.? Their website has some rather
annoying
That's not very true here about Oracle Unicode support. As there is no surrogate
character defined yet, Oracle is intended to use 3-byte encoding for UTF-8 as
performance and semantics reason. To keep the same binary order as UTF-16 that
commonly used in NT and Java client, Oracle UTF8 character
From our performance measurement, data size is the bottle neck for performance
in database application as majority of operation is to insert, update, and
retrieve
data. If most of your information in your database is English and W. European
characters, it is better to use UTF-8 as database
Not really for Unicode in which we have relocated some codepoints for Hangul
between Unicode 1.1 and 2.0 :)
Regards,
Jianping.
"Christopher J. Fynn" wrote:
Sandro
I'm sure someone official will give you an official answer, but I know the only
answer you are going to get to your question is
Mikko,
As there is no character defined in surrogate range in Unicode 3.0,
the maximum width for Oracle UTF8 character set is 3 bytes. Here I recommend
you to use 3 times for the number of characters you intend to store
in a column.
Regards,
Jianping..
Mikko Lahti wrote:
What is the correct way
26 matches
Mail list logo