Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

2001-02-05 Thread DougEwell2
In a message dated 2001-02-05 5:19:59 Pacific Standard Time, [EMAIL PROTECTED] writes: I have heard a rumour (i.e. my source is not involved in the reported activity) that: quote SAP, PeopleSoft, Siebel, Oracle and others are actually in the process of proposing a new format

Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

2001-02-05 Thread John O'Conner
Within a String, the encoding of char values is practically irrelevant. It is a hidden encoding that is never exposed to the user...or developer. When you access String char values, you use an index to 16-bit Unicode values. To my knowledge, Sun does not claim that its internal encoding of String

Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

2001-02-05 Thread John Cowan
John O'Conner wrote: Within a String, the encoding of char values is practically irrelevant. It is a hidden encoding that is never exposed to the user...or developer. When you access String char values, you use an index to 16-bit Unicode values. To my knowledge, Sun does not claim that its

Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

2001-02-05 Thread Tex Texin
John, It does impact developers. The API for DataInputStream defines FSS_UTF, which includes the funky null behavior. http://java.sun.com/products/jdk/1.2/docs/api/java/io/DataInputStream.html Since this API and other use this UTF, it gets into file formats and applications end up supporting

Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

2001-02-05 Thread John O'Conner
Perhaps the methods readUTF and writeUTF should be deprecated in favor of read/writeString. I will submit an RFE (request for enhancement) for this. I noticed that although the Data{Input,Output} interface clearly says that the write/readUTF handles a "Java modified UTF-8". The actual javadoc in

Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

2001-02-05 Thread Tex Texin
John, I am not clear from your comments which is the bug, since the doc goes both ways. Are the doc bugs that they say it is UTF-8, or that they say it is modified UTF-8? It would be great to learn that the functions are actually unmodified UTF-8, as I know of some interfaces that are writing

Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

2001-02-05 Thread John Cowan
Tex Texin wrote: I am not clear from your comments which is the bug, since the doc goes both ways. Are the doc bugs that they say it is UTF-8, or that they say it is modified UTF-8? It uses modified UTF-8, modified in three ways: 1) U+ is encoded in two bytes as 0xc0 0x80; 2) values

Re: Bastardizations of UTF-8 (was: Re: [OT] Unicode-compatible SQL?)

2001-02-05 Thread John O'Conner
Here's what I see about the Java API docs: 1. The Data{Input, Output}Stream methods {read, write}UTF could be named better. More appropriate names are {read, write}String. Strictly speaking, this is not a bug, but it could be better. That's why I call it an RFE (request for enhancement). 2.