Sherman, In the spirit of open source development and the whole Open JDK, I offer all you hardworking folks this patch to j.l.Character's embedded javadoc.
(I also have some comments on the code, but those I'll send under separate cover.) I set out to fix nothing more than the "errors of commission" — meaning, the factual misstatements — contained in the class's documentation. But while I was in there, I couldn't help but also address a few things that what one might in contrast call "errors of omission". Between the two sorts of fixes, I think it makes your document a lot more accurate, and therefore a lot more useful. In all this, I kept to the same style and tone found in the existing text. I also fixed a very few typos, but those I wouldn't have bothered you with. This is a very brief patch. Next I'll fix j.l.Pattern's documentation, but that, I am afraid, is going to take a bit more work than this did, which was really fast and easy to fix, all things considerd. Hope this helps! --tom
--- java_lang_Character.java 2011-04-14 17:15:17.000000000 -0600 +++ java_lang_Character.java-EDIT 2011-04-14 19:41:19.000000000 -0600 @@ -59,14 +59,14 @@ * <p>The <code>char</code> data type (and therefore the value that a * <code>Character</code> object encapsulates) are based on the * original Unicode specification, which defined characters as - * fixed-width 16-bit entities. The Unicode standard has since been + * fixed-width 16-bit entities. The Unicode Standard has since been * changed to allow for characters whose representation requires more * than 16 bits. The range of legal <em>code point</em>s is now * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>. * (Refer to the <a * href="http://www.unicode.org/reports/tr27/#notation"><i> * definition</i></a> of the U+<i>n</i> notation in the Unicode - * standard.) + * Standard.) * * <p><a name="BMP">The set of characters from U+0000 to U+FFFF is * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>. @@ -5198,13 +5198,14 @@ } /** - * Determines if the specified character is a lowercase character. + * Determines if the specified character (Java <code>char</code>) + * char is a lowercase letter. * <p> - * A character is lowercase if its general category type, provided - * by <code>Character.getType(ch)</code>, is + * A character is a lowercase letter (GC=Ll) if its general category + * type, provided by <code>Character.getType(ch)</code>, is * <code>LOWERCASE_LETTER</code>. * <p> - * The following are examples of lowercase characters: + * The following are examples of lowercase letters: * <p><blockquote><pre> * a b c d e f g h i j k l m n o p q r s t u v w x y z * '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6' @@ -5212,7 +5213,14 @@ * '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6' * '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF' * </pre></blockquote> - * <p> Many other Unicode characters are lowercase too. + * <p> Many other Unicode characters are lowercase, too, including many + * modifier letters and subscripts (which are GC=Lm, not GC=Ll), some + * Roman numerals (GC=Nl), some circled letters (GC=So in the + * Block=Enclosed_Alphanumerics), and even one combining character, + * U+02E4 COMBINING GREEK YPOGEGRAMMENI (GC=Mn). However, because + * those lowercase characters are not lowercase <i>letters</i>, this + * method will not identify them as lowercase. There are 159 such code + * points as of Unicode 6.0.<p> * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support @@ -5232,14 +5240,14 @@ } /** - * Determines if the specified character (Unicode code point) is a - * lowercase character. + * Determines if the specified Unicode code point is a + * lowercase letter. * <p> - * A character is lowercase if its general category type, provided - * by {@link Character#getType getType(codePoint)}, is + * A character is a lowercase letter (GC=Ll) if its general category type, + * provided by {@link Character#getType getType(codePoint)}, is * <code>LOWERCASE_LETTER</code>. * <p> - * The following are examples of lowercase characters: + * The following are examples of lowercase letters: * <p><blockquote><pre> * a b c d e f g h i j k l m n o p q r s t u v w x y z * '\u00DF' '\u00E0' '\u00E1' '\u00E2' '\u00E3' '\u00E4' '\u00E5' '\u00E6' @@ -5247,7 +5255,14 @@ * '\u00EF' '\u00F0' '\u00F1' '\u00F2' '\u00F3' '\u00F4' '\u00F5' '\u00F6' * '\u00F8' '\u00F9' '\u00FA' '\u00FB' '\u00FC' '\u00FD' '\u00FE' '\u00FF' * </pre></blockquote> - * <p> Many other Unicode characters are lowercase too. + * <p> Many other Unicode characters are lowercase, too, including many + * modifier letters and subscripts (which are GC=Lm, not GC=Ll), some + * Roman numerals (GC=Nl), some circled letters (GC=So in the + * Block=Enclosed_Alphanumerics), and even one combining character, + * U+02E4 COMBINING GREEK YPOGEGRAMMENI (GC=Mn). However, because + * those lowercase characters are not lowercase <i>letters</i>, this + * method will not identify them as lowercase. There are 159 such code + * points as of Unicode 6.0.<p> * * @param codePoint the character (Unicode code point) to be tested. * @return <code>true</code> if the character is lowercase; @@ -5263,12 +5278,12 @@ } /** - * Determines if the specified character is an uppercase character. + * Determines if the specified character (Java <code>char</code>) is an uppercase letter. * <p> - * A character is uppercase if its general category type, provided by + * A character is an uppercase letter (GC=Lu) if its general category type, provided by * <code>Character.getType(ch)</code>, is <code>UPPERCASE_LETTER</code>. * <p> - * The following are examples of uppercase characters: + * The following are examples of uppercase letters: * <p><blockquote><pre> * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z * '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7' @@ -5276,7 +5291,12 @@ * '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8' * '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE' * </pre></blockquote> - * <p> Many other Unicode characters are uppercase too.<p> + * <p>Many other Unicode characters are uppercase, too, including some + * Roman numerals (which are GC=Nl, not GC=Lu) and some circled + * letters (GC=So in the Block=Enclosed_Alphanumerics). However, + * because those uppercase characters are not uppercase + * <i>letters</i>, this method will not identify them as being + * uppercase. There are 42 such characters as of Unicode 6.0.<p> * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support @@ -5297,12 +5317,12 @@ } /** - * Determines if the specified character (Unicode code point) is an uppercase character. + * Determines if the specified Unicode code point is an uppercase letter. * <p> - * A character is uppercase if its general category type, provided by + * A character is an uppercase letter (GC=Lu) if its general category type, provided by * {@link Character#getType(int) getType(codePoint)}, is <code>UPPERCASE_LETTER</code>. * <p> - * The following are examples of uppercase characters: + * The following are examples of uppercase letters: * <p><blockquote><pre> * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z * '\u00C0' '\u00C1' '\u00C2' '\u00C3' '\u00C4' '\u00C5' '\u00C6' '\u00C7' @@ -5310,7 +5330,12 @@ * '\u00D0' '\u00D1' '\u00D2' '\u00D3' '\u00D4' '\u00D5' '\u00D6' '\u00D8' * '\u00D9' '\u00DA' '\u00DB' '\u00DC' '\u00DD' '\u00DE' * </pre></blockquote> - * <p> Many other Unicode characters are uppercase too.<p> + * <p>Many other Unicode characters are uppercase, too, including some + * Roman numerals (which are GC=Nl, not GC=Lu) and some circled + * letters (GC=So in the Block=Enclosed_Alphanumerics). However, + * because those uppercase characters are not uppercase + * <i>letters</i>, this method will not identify them as being + * uppercase. There are 42 such characters as of Unicode 6.0.<p> * * @param codePoint the character (Unicode code point) to be tested. * @return <code>true</code> if the character is uppercase; @@ -5326,14 +5351,14 @@ } /** - * Determines if the specified character is a titlecase character. + * Determines if the specified character (Java <code>char</code>) is a titlecase letter. * <p> - * A character is a titlecase character if its general + * A character is a titlecase letter (GC=Lt) if its general * category type, provided by <code>Character.getType(ch)</code>, * is <code>TITLECASE_LETTER</code>. * <p> - * Some characters look like pairs of Latin letters. For example, there - * is an uppercase letter that looks like "LJ" and has a corresponding + * Some characters look like pairs of letters. For example, there + * is an uppercase Latin letter that looks like "LJ" and has a corresponding * lowercase letter that looks like "lj". A third form, which looks like "Lj", * is the appropriate form to use when rendering a word in lowercase * with initial capitals, as for a book title. @@ -5345,8 +5370,12 @@ * <li><code>LATIN CAPITAL LETTER L WITH SMALL LETTER J</code> * <li><code>LATIN CAPITAL LETTER N WITH SMALL LETTER J</code> * <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</code> + * <li><code>GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI</code> + * <li><code>GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI</code> * </ul> - * <p> Many other Unicode characters are titlecase too.<p> + * <p> Many other Unicode characters are titlecase letters, too. + * As of Unicode 6.0, there are 31 titlecase characters, all of + * which are titlecase letters.<p> * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support @@ -5367,14 +5396,14 @@ } /** - * Determines if the specified character (Unicode code point) is a titlecase character. + * Determines if the specified Unicode code point is a titlecase letter. * <p> - * A character is a titlecase character if its general - * category type, provided by {@link Character#getType(int) getType(codePoint)}, + * A character is a titlecase letter (GC=Lt) if its general + * category type, provided by <code>Character.getType(ch)</code>, * is <code>TITLECASE_LETTER</code>. * <p> - * Some characters look like pairs of Latin letters. For example, there - * is an uppercase letter that looks like "LJ" and has a corresponding + * Some characters look like pairs of letters. For example, there + * is an uppercase Latin letter that looks like "LJ" and has a corresponding * lowercase letter that looks like "lj". A third form, which looks like "Lj", * is the appropriate form to use when rendering a word in lowercase * with initial capitals, as for a book title. @@ -5386,8 +5415,12 @@ * <li><code>LATIN CAPITAL LETTER L WITH SMALL LETTER J</code> * <li><code>LATIN CAPITAL LETTER N WITH SMALL LETTER J</code> * <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</code> + * <li><code>GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI</code> + * <li><code>GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI</code> * </ul> - * <p> Many other Unicode characters are titlecase too.<p> + * <p> Many other Unicode characters are titlecase letters, too. + * As of Unicode 6.0, there are 31 titlecase characters, all of + * which are titlecase letters.<p> * * @param codePoint the character (Unicode code point) to be tested. * @return <code>true</code> if the character is titlecase; @@ -5405,7 +5438,7 @@ /** * Determines if the specified character is a digit. * <p> - * A character is a digit if its general category type, provided + * A character is a digit (GC=Nd) if its general category type, provided * by <code>Character.getType(ch)</code>, is * <code>DECIMAL_DIGIT_NUMBER</code>. * <p> @@ -5444,7 +5477,7 @@ /** * Determines if the specified character (Unicode code point) is a digit. * <p> - * A character is a digit if its general category type, provided + * A character is a digit (GC=Nd) if its general category type, provided * by {@link Character#getType(int) getType(codePoint)}, is * <code>DECIMAL_DIGIT_NUMBER</code>. * <p> @@ -5529,7 +5562,7 @@ } /** - * Determines if the specified character is a letter. + * Determines if the specified character (Java <code>char</code>) is a letter. * <p> * A character is considered to be a letter if its general * category type, provided by <code>Character.getType(ch)</code>, @@ -5542,13 +5575,20 @@ * <li> <code>OTHER_LETTER</code> * </ul> * - * Not all letters have case. Many characters are - * letters but are neither uppercase nor lowercase nor titlecase. + * Not all letters have case, and not all cased characters are + * letters. Many characters are letters but are neither uppercase + * (GC=Lu) nor lowercase (GC=Ll) nor titlecase (GC=Lt). Letters + * without case are either Modifier_Letters (GC=Lm) or Other_Letters + * (GC=Lo), but some modifier letters I<do> have case. Similarly, not all + * characters with case are letters, such as the Roman numerals, which + * are Letter_Numbers (GC=Nl) and the circled letters, which are + * Other_Symbols (GC=So). There are 201 cased characters as of + * Unicode 6.0 that are neither uppercase, lowercase, nor titlecase. * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use - * the {@link #isLetter(int)} method. + * the {@link #isLetter(int)} method.<p> * * @param ch the character to be tested. * @return <code>true</code> if the character is a letter; @@ -5581,8 +5621,15 @@ * <li> <code>OTHER_LETTER</code> * </ul> * - * Not all letters have case. Many characters are - * letters but are neither uppercase nor lowercase nor titlecase. + * Not all letters have case, and not all cased characters are + * letters. Many characters are letters but are neither uppercase + * (GC=Lu) nor lowercase (GC=Ll) nor titlecase (GC=Lt). Letters + * without case are either Modifier_Letters (GC=Lm) or Other_Letters + * (GC=Lo), but some modifier letters I<do> have case. Similarly, not all + * characters with case are letters, such as the Roman numerals, which + * are Letter_Numbers (GC=Nl) and the circled letters, which are + * Other_Symbols (GC=So). There are 201 cased characters as of + * Unicode 6.0 that are neither uppercase, lowercase, nor titlecase.<p> * * @param codePoint the character (Unicode code point) to be tested. * @return <code>true</code> if the character is a letter; @@ -5606,7 +5653,7 @@ } /** - * Determines if the specified character is a letter or digit. + * Determines if the specified character (Java <code>char</code>) is a letter or digit. * <p> * A character is considered to be a letter or digit if either * <code>Character.isLetter(char ch)</code> or @@ -5661,14 +5708,14 @@ } /** - * Determines if the specified character is permissible as the first + * Determines if the specified character (Java <code>char</code>) is permissible as the first * character in a Java identifier. * <p> * A character may start a Java identifier if and only if * one of the following is true: * <ul> * <li> {@link #isLetter(char) isLetter(ch)} returns <code>true</code> - * <li> {@link #getType(char) getType(ch)} returns <code>LETTER_NUMBER</code> + * <li> {@link #getType(char) getType(ch)} returns <code>LETTER_NUMBER</code> (GC=Nl) * <li> ch is a currency symbol (such as "$") * <li> ch is a connecting punctuation character (such as "_"). * </ul> @@ -5691,7 +5738,7 @@ } /** - * Determines if the specified character may be part of a Java + * Determines if the specified character (Java <code>char</code>) may be part of a Java * identifier as other than the first character. * <p> * A character may be part of a Java identifier if and only if any @@ -5727,7 +5774,7 @@ } /** - * Determines if the specified character is + * Determines if the specified character (Java <code>char</code>) is * permissible as the first character in a Java identifier. * <p> * A character may start a Java identifier if and only if @@ -5787,7 +5834,7 @@ } /** - * Determines if the specified character may be part of a Java + * Determines if the specified character (Java <code>char</code>) may be part of a Java * identifier as other than the first character. * <p> * A character may be part of a Java identifier if any of the following @@ -5857,7 +5904,7 @@ } /** - * Determines if the specified character is permissible as the + * Determines if the specified character (Java <code>char</code>) is permissible as the * first character in a Unicode identifier. * <p> * A character may start a Unicode identifier if and only if @@ -5910,7 +5957,7 @@ } /** - * Determines if the specified character may be part of a Unicode + * Determines if the specified character (Java <code>char</code>) may be part of a Unicode * identifier as other than the first character. * <p> * A character may be part of a Unicode identifier if and only if @@ -5974,7 +6021,7 @@ } /** - * Determines if the specified character should be regarded as + * Determines if the specified character (Java <code>char</code>) should be regarded as * an ignorable character in a Java identifier or a Unicode identifier. * <p> * The following Unicode characters are ignorable in a Java identifier @@ -6039,20 +6086,34 @@ } /** - * Converts the character argument to lowercase using case - * mapping information from the UnicodeData file. + * Converts the character (Java <code>char</code>) argument to lowercase + * using case mapping information from the UnicodeData file. * <p> * Note that * <code>Character.isLowerCase(Character.toLowerCase(ch))</code> - * does not always return <code>true</code> for some ranges of - * characters, particularly those that are symbols or ideographs. + * does not return <code>true</code> for some ranges of + * lowercase characters, particularly those that are symbols or ideographs, + * or lowercase modifier letters. + * + * <p><b>Note:</b> This method cannot handle characters whose lowercase mapping + * according to the SpecialCasing file in the Unicode specification + * returns more than one character. As of Unicode 6.0, there is only + * one such code point (if locales are not considered): + * + * <ul> + * <li><code> LATIN CAPITAL LETTER I WITH DOT ABOVE</code></li> + * </ul> * * <p>In general, {@link String#toLowerCase()} should be used to map * characters to lowercase. <code>String</code> case mapping methods * have several benefits over <code>Character</code> case mapping methods. * <code>String</code> case mapping methods can perform locale-sensitive * mappings, context-sensitive mappings, and 1:M character mappings, whereas - * the <code>Character</code> case mapping methods cannot. + * the <code>Character</code> case mapping methods cannot. In Unicode terminology, + * <code>Character</code> case mappings are <i>simple case mappings</i> (because they + * can only map to a single character), while <code>String</code> case mappings + * are <i>full case mappings</i>, because they can map to multiple characters, + * as defined by the SpecialCasing file in the Unicode specification * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support @@ -6071,20 +6132,33 @@ /** * Converts the character (Unicode code point) argument to - * lowercase using case mapping information from the UnicodeData + * lowercase using simple case mapping information from the UnicodeData * file. * * <p> Note that * <code>Character.isLowerCase(Character.toLowerCase(codePoint))</code> - * does not always return <code>true</code> for some ranges of - * characters, particularly those that are symbols or ideographs. + * does not return <code>true</code> for some ranges of + * lowercase characters, particularly those that are symbols or ideographs, + * or lowercase modifier letters. + * + * <p><b>Note:</b> This method cannot handle characters whose lowercase mapping + * according to the SpecialCasing file returns more than one character. + * As of Unicode 6.0, there is only one such code point, + * if locales are not considered: + * <ul> + * <li><code> LATIN CAPITAL LETTER I WITH DOT ABOVE</code></li> + * </ul> * * <p>In general, {@link String#toLowerCase()} should be used to map * characters to lowercase. <code>String</code> case mapping methods * have several benefits over <code>Character</code> case mapping methods. * <code>String</code> case mapping methods can perform locale-sensitive * mappings, context-sensitive mappings, and 1:M character mappings, whereas - * the <code>Character</code> case mapping methods cannot. + * the <code>Character</code> case mapping methods cannot. In Unicode terminology, + * <code>Character</code> case mappings are <i>simple case mappings</i> (because they + * can only map to a single character), while <code>String</code> case mappings + * are <i>full case mappings</i>, because they can map to multiple characters, + * as defined by the SpecialCasing file in the Unicode specification * * @param codePoint the character (Unicode code point) to be converted. * @return the lowercase equivalent of the character (Unicode code @@ -6099,20 +6173,39 @@ } /** - * Converts the character argument to uppercase using case mapping - * information from the UnicodeData file. + * Converts the character (Java <code>char</code>) argument to + * uppercase using simple case mapping information from the + * UnicodeData file. + * * <p> * Note that * <code>Character.isUpperCase(Character.toUpperCase(ch))</code> - * does not always return <code>true</code> for some ranges of - * characters, particularly those that are symbols or ideographs. + * does not return <code>true</code> for some ranges of + * uppercase characters, particularly those that are symbols or ideographs. + * + * <p><b>Note:</b> This method cannot handle characters whose uppercase mapping + * according to the SpecialCasing file in the Unicode specification + * than one character. Examples of such code points include: + * <ul> + * <li><code>LATIN SMALL LETTER SHARP S</code></li> + * <li><code>LATIN SMALL LETTER J WITH CARON</code></li> + * <li><code>LATIN SMALL LETTER A WITH RIGHT HALF RING</code></li> + * <li><code>GREEK SMALL LETTER UPSILON WITH PSILI</code></li> + * <li><code>LATIN SMALL LIGATURE FI</code></li> + * <li><code>LATIN SMALL LIGATURE ST</code></li> + * </ul> + * <p>As of Unicode 6.0, there are 102 such code points. * * <p>In general, {@link String#toUpperCase()} should be used to map * characters to uppercase. <code>String</code> case mapping methods * have several benefits over <code>Character</code> case mapping methods. * <code>String</code> case mapping methods can perform locale-sensitive * mappings, context-sensitive mappings, and 1:M character mappings, whereas - * the <code>Character</code> case mapping methods cannot. + * the <code>Character</code> case mapping methods cannot. In Unicode terminology, + * <code>Character</code> case mappings are <i>simple case mappings</i> because they + * can only map to a single character, while <code>String</code> case mappings + * are <i>full case mappings</i> because they can map to multiple characters, + * as defined by the SpecialCasing file in the Unicode specification * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support @@ -6131,20 +6224,37 @@ /** * Converts the character (Unicode code point) argument to - * uppercase using case mapping information from the UnicodeData + * uppercase using simple case mapping information from the UnicodeData * file. * * <p>Note that * <code>Character.isUpperCase(Character.toUpperCase(codePoint))</code> - * does not always return <code>true</code> for some ranges of - * characters, particularly those that are symbols or ideographs. + * does not return <code>true</code> for some ranges of + * uppercase characters, particularly those that are symbols or ideographs. + + * <p><b>Note:</b> This method cannot handle characters whose uppercase mapping + * according to the SpecialCasing file returns more than one character. + * Examples of such code points include: + * <ul> + * <li><code>LATIN SMALL LETTER SHARP S</code></li> + * <li><code>LATIN SMALL LETTER J WITH CARON</code></li> + * <li><code>LATIN SMALL LETTER A WITH RIGHT HALF RING</code></li> + * <li><code>GREEK SMALL LETTER UPSILON WITH PSILI</code></li> + * <li><code>LATIN SMALL LIGATURE FI</code></li> + * <li><code>LATIN SMALL LIGATURE ST</code></li> + * </ul> + * <p>As of Unicode 6.0, there are 102 such code points. * * <p>In general, {@link String#toUpperCase()} should be used to map * characters to uppercase. <code>String</code> case mapping methods * have several benefits over <code>Character</code> case mapping methods. * <code>String</code> case mapping methods can perform locale-sensitive * mappings, context-sensitive mappings, and 1:M character mappings, whereas - * the <code>Character</code> case mapping methods cannot. + * the <code>Character</code> case mapping methods cannot. In Unicode terminology, + * <code>Character</code> case mappings are <i>simple case mappings</i> because they + * can only map to a single character, while <code>String</code> case mappings + * are <i>full case mappings</i> because they can map to multiple characters + * as defined by the SpecialCasing file in the Unicode specification. * * @param codePoint the character (Unicode code point) to be converted. * @return the uppercase equivalent of the character, if any; @@ -6159,25 +6269,39 @@ } /** - * Converts the character argument to titlecase using case mapping - * information from the UnicodeData file. If a character has no - * explicit titlecase mapping and is not itself a titlecase char - * according to UnicodeData, then the uppercase mapping is - * returned as an equivalent titlecase mapping. If the - * <code>char</code> argument is already a titlecase - * <code>char</code>, the same <code>char</code> value will be - * returned. + * Converts the character (Java <code>char</code>) argument to + * titlecase using simple case mapping information from the + * UnicodeData file. If a character has no explicit titlecase + * mapping and is not itself a titlecase char according to + * UnicodeData, then the simple uppercase mapping is returned as an + * equivalent titlecase mapping. Simple mapping means that only single + * character returns are possible, and any full case mapping from + * the SpecialCasing file in the Unicode specification is + * disregarded. If the <code>char</code> argument is already a + * titlecase character, that same value will be returned. + * * <p> * Note that * <code>Character.isTitleCase(Character.toTitleCase(ch))</code> - * does not always return <code>true</code> for some ranges of - * characters. + * may not always return <code>true</code> for some ranges of + * titlecase characters. * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use * the {@link #toTitleCase(int)} method. * + * <p><b>Note:</b> This method cannot handle characters whose titlecase + * mapping according to the SpecialCasing file returns more than one character. + * Examples of such code points include: + * <ul> + * <li><code>LATIN SMALL LETTER SHARP S</code></li> + * <li><code>LATIN SMALL LETTER J WITH CARON</code></li> + * <li><code>LATIN SMALL LIGATURE FI</code></li> + * <li><code>LATIN SMALL LIGATURE ST</code></li> + * </ul> + * <p>As of Unicode 6.0, there are 48 such code points. + * * @param ch the character to be converted. * @return the titlecase equivalent of the character, if any; * otherwise, the character itself. @@ -6191,20 +6315,34 @@ } /** - * Converts the character (Unicode code point) argument to titlecase using case mapping - * information from the UnicodeData file. If a character has no - * explicit titlecase mapping and is not itself a titlecase char - * according to UnicodeData, then the uppercase mapping is - * returned as an equivalent titlecase mapping. If the - * character argument is already a titlecase - * character, the same character value will be - * returned. + * Converts the character (Unicode code point) argument to titlecase + * using simple case mapping information from the UnicodeData file. If a + * character has no explicit titlecase mapping and is not itself a + * titlecase char according to UnicodeData, then the simple uppercase + * mapping is returned as an equivalent titlecase mapping. Simple + * mapping means that only single-character returns are possible, + * and any full case mapping from the SpecialCasing file in the + * Unicode specification is disregarded. If the <code>int</code> + * argument is already a titlecase character, that same value will + * be returned. + * * * <p>Note that * <code>Character.isTitleCase(Character.toTitleCase(codePoint))</code> * does not always return <code>true</code> for some ranges of * characters. * + * <p><b>Note:</b> This method cannot handle characters whose titlecase + * mapping according to the SpecialCasing file returns more than one character. + * Examples of such code points include: + * <ul> + * <li><code>LATIN SMALL LETTER SHARP S</code></li> + * <li><code>LATIN SMALL LETTER J WITH CARON</code></li> + * <li><code>LATIN SMALL LIGATURE FI</code></li> + * <li><code>LATIN SMALL LIGATURE ST</code></li> + * </ul> + * <p>As of Unicode 6.0, there are 48 such code points. + * * @param codePoint the character (Unicode code point) to be converted. * @return the titlecase equivalent of the character, if any; * otherwise, the character itself. @@ -6306,7 +6444,7 @@ * The letters A-Z in their uppercase (<code>'\u0041'</code> through * <code>'\u005A'</code>), lowercase * (<code>'\u0061'</code> through <code>'\u007A'</code>), and - * full width variant (<code>'\uFF21'</code> through + * fullwidth variant (<code>'\uFF21'</code> through * <code>'\uFF3A'</code> and <code>'\uFF41'</code> through * <code>'\uFF5A'</code>) forms have numeric values from 10 * through 35. This is independent of the Unicode specification, @@ -6344,7 +6482,7 @@ * The letters A-Z in their uppercase (<code>'\u0041'</code> through * <code>'\u005A'</code>), lowercase * (<code>'\u0061'</code> through <code>'\u007A'</code>), and - * full width variant (<code>'\uFF21'</code> through + * fullwidth variant (<code>'\uFF21'</code> through * <code>'\uFF3A'</code> and <code>'\uFF41'</code> through * <code>'\uFF5A'</code>) forms have numeric values from 10 * through 35. This is independent of the Unicode specification, @@ -6404,11 +6542,15 @@ /** - * Determines if the specified character is a Unicode space character. - * A character is considered to be a space character if and only if - * it is specified to be a space character by the Unicode standard. This - * method returns true if the character's general category type is any of - * the following: + * Determines if the specified character (Java <code>char</code>) is + * a Unicode space separator (GC=Zs), line separator (GC=Zl), or + * paragraph separator (GC=Zp). This is I<not> equivalent to the + * Unicode White_Space property, which also includes six control + * characters. A character is considered to be a space character if + * and only if it is specified to be a space, line, or paragraph + * separator by the Unicode Standard. This method returns true if + * the character's general category type is any of the following: + * <ul> * <li> <code>SPACE_SEPARATOR</code> * <li> <code>LINE_SEPARATOR</code> @@ -6431,10 +6573,13 @@ } /** - * Determines if the specified character (Unicode code point) is a - * Unicode space character. A character is considered to be a - * space character if and only if it is specified to be a space - * character by the Unicode standard. This method returns true if + * Determines if the specified character (Unicode code point) is + * a Unicode space separator (GC=Zs), line separator (GC=Zl), or + * paragraph separator (GC=Zp). This is I<not> equivalent to the + * Unicode White_Space property, which also includes six control + * characters. A character is considered to be a space character if + * and only if it is specified to be a space, line, or paragraph + * separator by the Unicode Standard. This method returns true if * the character's general category type is any of the following: * * <ul> @@ -6475,6 +6620,8 @@ * <li> It is <code>'\u001E'</code>, U+001E RECORD SEPARATOR. * <li> It is <code>'\u001F'</code>, U+001F UNIT SEPARATOR. * </ul> + * <p><b>Note:</b> The Unicode White_Space property is not + * the same as Java whitespace. * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support @@ -6493,11 +6640,11 @@ /** * Determines if the specified character (Unicode code point) is - * white space according to Java. A character is a Java - * whitespace character if and only if it satisfies one of the - * following criteria: + * white space according to Java, not according to Unicode. A + * character is a Java whitespace character if and only if it + * satisfies one of the following criteria: * <ul> - * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR}, + * <li> It is a Unicode space separator (character{@link #SPACE_SEPARATOR}, * {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR}) * but is not also a non-breaking space (<code>'\u00A0'</code>, * <code>'\u2007'</code>, <code>'\u202F'</code>). @@ -6511,6 +6658,8 @@ * <li> It is <code>'\u001E'</code>, U+001E RECORD SEPARATOR. * <li> It is <code>'\u001F'</code>, U+001F UNIT SEPARATOR. * </ul> + * <p><b>Note:</b> The Unicode White_Space property is not + * the same as Java whitespace. * <p> * * @param codePoint the character (Unicode code point) to be tested. @@ -6524,12 +6673,14 @@ } /** - * Determines if the specified character is an ISO control + * Determines if the specified character (Java <code>char</code>) is an ISO control * character. A character is considered to be an ISO control * character if its code is in the range <code>'\u0000'</code> * through <code>'\u001F'</code> or in the range * <code>'\u007F'</code> through <code>'\u009F'</code>. * + * <p><b>Note:</b> This is identical to the Unicode Control property (GC=Cc). + * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support * all Unicode characters, including supplementary characters, use @@ -6548,12 +6699,14 @@ } /** - * Determines if the referenced character (Unicode code point) is an ISO control + * Determines if the specified character (Unicode code point) is an ISO control * character. A character is considered to be an ISO control * character if its code is in the range <code>'\u0000'</code> * through <code>'\u001F'</code> or in the range * <code>'\u007F'</code> through <code>'\u009F'</code>. * + * <p><b>Note:</b> This is identical to the Unicode Control property (GC=Cc). + * * @param codePoint the character (Unicode code point) to be tested. * @return <code>true</code> if the character is an ISO control character; * <code>false</code> otherwise. @@ -6570,7 +6723,8 @@ } /** - * Returns a value indicating a character's general category. + * Returns a value indicating the general category of the + * specified character (Java <code>char</code>). * * <p><b>Note:</b> This method cannot handle <a * href="#supplementary"> supplementary characters</a>. To support @@ -6617,7 +6771,8 @@ } /** - * Returns a value indicating a character's general category. + * Returns a value indicating the general category of the + * specified character (Unicode code point). * * @param codePoint the character (Unicode code point) to be tested. * @return a value of type <code>int</code> representing the @@ -6697,7 +6852,7 @@ /** * Returns the Unicode directionality property for the given - * character. Character directionality is used to calculate the + * character (Java <code>char</code>). Character directionality is used to calculate the * visual ordering of text. The directionality value of undefined * <code>char</code> values is <code>DIRECTIONALITY_UNDEFINED</code>. * @@ -6740,7 +6895,7 @@ * Returns the Unicode directionality property for the given * character (Unicode code point). Character directionality is * used to calculate the visual ordering of text. The - * directionality value of undefined character is {@link + * directionality value of an undefined character is {@link * #DIRECTIONALITY_UNDEFINED}. * * @param codePoint the character (Unicode code point) for which @@ -6774,7 +6929,7 @@ } /** - * Determines whether the character is mirrored according to the + * Determines whether the character (Java <code>char</code>) is mirrored according to the * Unicode specification. Mirrored characters should have their * glyphs horizontally mirrored when displayed in text that is * right-to-left. For example, <code>'\u0028'</code> LEFT @@ -6884,7 +7039,7 @@ * @since 1.4 */ static char[] toUpperCaseCharArray(int codePoint) { - // As of Unicode 4.0, 1:M uppercasings only happen in the BMP. + // As of Unicode 6.0, 1:M uppercasings only happen in the BMP. assert isBmpCodePoint(codePoint); return CharacterData.of(codePoint).toUpperCaseCharArray(codePoint); } @@ -6917,7 +7072,7 @@ * Note: if the specified character is not assigned a name by * the <i>UnicodeData</i> file (part of the Unicode Character * Database maintained by the Unicode Consortium), the returned - * name is the same as the result of expression + * name is the same as the result of expression. * * <blockquote><code> * Character.UnicodeBlock.of(codePoint)