Re: Playing with Unicode (was: Re: UTF-17)

2001-06-26 Thread Jianping Yang
Carl W. Brown wrote: Jianping, In fact, Oracle 8.0 development started in 1992 and it was released in 1994, which should be much earlier than NT 5.0. Back then I was still using Oracle 7. Thank you for correcting me. What made you chose UTF-8 back in 1992? Oracle 7 supports

Re: Playing with Unicode (was: Re: UTF-17)

2001-06-26 Thread Jianping Yang
Carl W. Brown wrote: I warned my clients not to use surrogates with Oracle 8.x data bases. I also can not see that they could be so short sighted not to develop a full UTF-8 encoder. If MS can put surrogate support into Windows 2000, then they can put it into Oracle 8.0. I am sure that

Re: UTF8 encoding - What should I tell my customers?

2001-06-19 Thread Jianping Yang
Carl W. Brown wrote: If there are no surrogates in the database, is there any reason that I can not change the database from UTF8 to AL32UTF8? You can change the database from UTF8 to AL32UTF8 in this case. Also you can use Oracle database scanner to scan your UTF8 database and if you

Re: FSS-UTF, UTF-2, UTF-8, and UTF-16

2001-06-18 Thread Jianping Yang
Mark Davis wrote: You are correct about the published definitions. As I recall, though, we were referring to UTF-FSS as UTF-8 in the UTC meetings before it was changed to account for UTF-16. In any event, I don't know whether Oracle was involved in those discussions or not, or whether

Re: FSS-UTF, UTF-2, UTF-8, and UTF-16

2001-06-18 Thread Jianping Yang
Markus Scherer wrote: This means that Oracle mis-implemented the UTF-8 standard as it was specified at that time, starting at least with Unicode 2.0. No, Oracle does not mis-implement the UTF-8 standard but only limit its support to BMP only. Except the backward compatibility reason,

Re: UTF-16 problems

2001-06-12 Thread Jianping Yang
Lisa Moore wrote: Jianping wrote: only Oracle provides fully UTF-8 and UTF-16 support for RDBMS Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16. And DB2 for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's intrepretation of fully The fully

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Jianping Yang
[EMAIL PROTECTED] wrote: On 06/11/2001 10:45:46 PM Mark Davis wrote: [earlier] - Oracle could probably make a case for their name for UTF8 simply being an anachronism. After all, the original definition of UTF-8 did convert surrogate pairs as they are doing in what they call UTF8.

Re: UTF8 vs AL32UTF8

2001-06-12 Thread Jianping Yang
[EMAIL PROTECTED] wrote: On 06/12/2001 01:13:48 PM Jianping Yang wrote: If you convert ED A0 80 ED B0 80 into UTF-16, what does it mean then? I think definitely it means U-0001. I'd say not if that 6-byte sequence is interpreted in terms of *UTF-8*. So UTF-8 is not compatible

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang
One thing needs to clarify here is that there is no four byte encoding in UTF-8S proposal and four byte encoding is illegal but not irregular. As everything in UTF-8S is perfect match to UTF-16, any blame to this proposal also applies to UTF-16 encoding form. Regards, Jianping. Kenneth Whistler

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang
Kenneth Whistler wrote: Jianping wrote: One thing needs to clarify here is that there is no four byte encoding in UTF-8S proposal and four byte encoding is illegal but not irregular. As everything in UTF-8S is perfect match to UTF-16, any blame to this proposal also applies to UTF-16

Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

2001-06-12 Thread Jianping Yang
Kenneth Whistler wrote: Jianping responded: Kenneth Whistler wrote: Jianping wrote: One thing needs to clarify here is that there is no four byte encoding in UTF-8S proposal and four byte encoding is illegal but not irregular. As everything in UTF-8S is perfect match to

Re: UTF-16 problems

2001-06-11 Thread Jianping Yang
Is this the language that should be used in a professional way? I wonder how could this happen to the Unicode mail list! Michael (michka) Kaplan wrote: From: Rick McGowan [EMAIL PROTECTED] ... asking for a lavicious license to be lecherously lazy Parse error at lavicious. No such word

Re: UTF-16 problems

2001-06-11 Thread Jianping Yang
Michael (michka) Kaplan wrote: From: Jianping Yang [EMAIL PROTECTED] If UTF-8S were to by some miracle be accepted by the UTC, implementers will be put out and offended for most of the next decade. If it is, that is rule of law from UTC. Very true. devil's advocate

Re: UTF-8 syntax

2001-06-08 Thread Jianping Yang
Ken, From your analysis, it make me more believe that we need a UTF-8S not only for the binary order but also for this ambiguity applying to both UTF-8S and UTF-16. As proposed UTF-8S encoding is logically equivalent to the UTF-16, they share the same property which is different from UTF-8 and

Re: UTF-8 syntax

2001-06-08 Thread Jianping Yang
but will not work with standard UTF-8 encoders and decoders. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Jianping Yang Sent: Thursday, June 07, 2001 6:51 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: UTF-8 syntax I don't get point

Re: UTF8 vs AL32UTF8

2001-06-08 Thread Jianping Yang
semantics is totally independent of its character set, which will enable you to build portable NCHAR application that uses either AL16UTF16 or UTF8 for your benefit when considering storage space for particular locale. Regards, Jianping. Carl -Original Message- From: Jianping Yang

Re: UTF-8 syntax

2001-06-08 Thread Jianping Yang
the same character by searching D800 DC00 in UTF-16. Unless this unpaired surrogate will be totally eliminated from UTF forms, this issue could be hit. Regards, Jianping. Ayers, Mike wrote: From: Jianping Yang [mailto:[EMAIL PROTECTED]] This will fix the following problem for example

Re: UTF-8 syntax

2001-06-08 Thread Jianping Yang
Ken, Thanks, your comment could close this argument against UTF-8S syntax as the attack here is groundless now, because there is no need to encoding ED A0 80 and ED B0 80 as separate *paired* surrogates in UTF-8S and they will always be converted into 0x1 in UTF-32 or F0 90 80 80 in UTF-8.

Re: UTF-8 syntax

2001-06-07 Thread Jianping Yang
I don't get point from this argument as UTF-8S is exactly mapped to UTF-16 in UTF-16 code unit which means one UTF-16 code unit will be mapped to either one, two, or three bytes in UTF-8S. So if you are saying there is ambiguous in UTF-8S, it should also apply to UTF-16, which does not make sense

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-29 Thread Jianping Yang
Antoine Leca wrote: Jianping Yang wrote: As a matter of fact, the surrogate or supplementary character was not defined in the past, How long is the past? I remember reading about these surrogates the first time I put my hands on a draft copy of ISO 10646. It was nearly six years ago

Re: ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

2001-05-27 Thread Jianping Yang
I don't want to argue on this lengthy email, but only point two facts: According to the proposal, UTF-8S and UTF-32S would not have the same status: they wouldn't be for interchange; they'd just be for representation internal to a given system, like UTF-EBCDIC (which, I think I heard, has not

Re: Question on Unicode data files

2001-02-28 Thread Jianping Yang
Mr Zhang is CEO of that company. Regards, Jianping. John Jenkins wrote: On Monday, February 26, 2001, at 09:12 PM, Richard Cook wrote: Is there any connection between this http://www.unihan.com.cn/ site and IRG? What is UniHan Digital Tech Co.? Their website has some rather annoying

Re: [OT] Unicode-compatible SQL?

2001-02-01 Thread Jianping Yang
That's not very true here about Oracle Unicode support. As there is no surrogate character defined yet, Oracle is intended to use 3-byte encoding for UTF-8 as performance and semantics reason. To keep the same binary order as UTF-16 that commonly used in NT and Java client, Oracle UTF8 character

Re: Unicode on a website

2000-09-22 Thread Jianping Yang
From our performance measurement, data size is the bottle neck for performance in database application as majority of operation is to insert, update, and retrieve data. If most of your information in your database is English and W. European characters, it is better to use UTF-8 as database

Re: is there any way to change already defined character codes?

2000-08-08 Thread Jianping Yang
Not really for Unicode in which we have relocated some codepoints for Hangul between Unicode 1.1 and 2.0 :) Regards, Jianping. "Christopher J. Fynn" wrote: Sandro I'm sure someone official will give you an official answer, but I know the only answer you are going to get to your question is

Re: Oracle and Surrogate Pairs

2000-07-24 Thread Jianping Yang
Mikko, As there is no character defined in surrogate range in Unicode 3.0, the maximum width for Oracle UTF8 character set is 3 bytes. Here I recommend you to use 3 times for the number of characters you intend to store in a column. Regards, Jianping.. Mikko Lahti wrote: What is the correct way