From: http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-textrep
The primitive data type char in the Java programming language is an unsigned 16-bit integer that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of UTF-16<http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#utf-16> . Confusing eh? I guess you would call what Java uses internally a UTF-16 subset. So, technically not UTF-16, but practically UTF-16 (a two-byte UTF-16 subset) On Tue, Aug 28, 2012 at 11:18 AM, Paul Gilmartin <[email protected]>wrote: > On Tue, 28 Aug 2012 10:39:03 -0500, Kirk Wolf wrote: > > > >UTF-16 is used in Java (and other languages) as the internal > representation > >of characters and strings (each character represented by two bytes). > > > No. Not according to: > > http://en.wikipedia.org/wiki/UTF-16 > > UTF-16 (16-bit Unicode Transformation Format) is a character encoding > for > Unicode capable of encoding 1,112,064[1] numbers (called code points) > in the > Unicode code space from 0 to 0x10FFFF. It produces a variable-length > result > of either one or two 16-bit code units per code point. > > And: > > http://www.ietf.org/rfc/rfc2781.txt > > The rules for how characters are encoded in UTF-16 are: > > - Characters with values less than 0x10000 are represented as a > single 16-bit integer with a value equal to that of the character > number. > > - Characters with values between 0x10000 and 0x10FFFF are > represented by a 16-bit integer with a value between 0xD800 and > 0xDBFF (within the so-called high-half zone or high surrogate > area) followed by a 16-bit integer with a value between 0xDC00 and > 0xDFFF (within the so-called low-half zone or low surrogate area). > > - Characters with values greater than 0x10FFFF cannot be encoded in > UTF-16. > > -- gil > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO IBM-MAIN > ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
