Brian Jones wrote: > > As I recall Unicode now requires more bits than a Java 'char' allows. > I don't know that helps at all? I don't really know what Sun's > solution is. It looks like we did update to unicode data 3.0, but I > know our implementation fails many Mauve tests related to Character.
Unicode 3.1 introduced several code points in the surrogate space. And the upcoming 3.2 adds even more. These characters require two 16-bit fields to represent them (the first in \ud800 - \udb7f, the second in \udc00 - \udfff). And Java does ignore these - the 4-byte abbreviation sequences of UTF-8 are illegal in class files (you have to use a 6-byte sequence instead), and Java identifiers may not include surrogate characters. Sun would need to add more methods to the API to use them, because the point of surrogates is that two characters together have semantic meaning, while one alone is an error. For example, it is impossible to tell if \ud820 in isolation is part of a letter, number, or punctuation. So for now, Sun's "solution" is to stall. I did verify today that JDK 1.4 is still on Unicode 3.0.0. The implementation of Character that I just checked in to Classpath is identical in behavior to Sun's (fortunately, testing every method on all 64k chars is not terribly time-consuming). However, I could not run it through Mauve; as I still have been unable to compile a free VM on cygwin, and Sun's VM doesn't like me replacing core classes like Character. But if Character fails any tests in Mauve now, then I would suspect that Mauve has the bugs. -- This signature intentionally left boring. Eric Blake [EMAIL PROTECTED] BYU student, free software programmer _______________________________________________ Classpath mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/classpath

