DOC PATCH: java.lang.Character fixes (doc only, not code)

Tom Christiansen Thu, 14 Apr 2011 19:44:33 -0700

Sherman,

In the spirit of open source development and the whole Open JDK, I offer
all you hardworking folks this patch to j.l.Character's embedded javadoc.


(I also have some comments on the code, but those I'll send under
separate cover.)

I set out to fix nothing more than the "errors of commission" — meaning,
the factual misstatements — contained in the class's documentation.  But
while I was in there, I couldn't help but also address a few things that
what one might in contrast call "errors of omission".  Between the two
sorts of fixes, I think it makes your document a lot more accurate,
and therefore a lot more useful.

In all this, I kept to the same style and tone found in the existing text.
I also fixed a very few typos, but those I wouldn't have bothered you with.

This is a very brief patch.  Next I'll fix j.l.Pattern's documentation,
but that, I am afraid, is going to take a bit more work than this did,
which was really fast and easy to fix, all things considerd.

Hope this helps!

--tom

--- java_lang_Character.java    2011-04-14 17:15:17.000000000 -0600
+++ java_lang_Character.java-EDIT       2011-04-14 19:41:19.000000000 -0600
@@ -59,14 +59,14 @@
  * <p>The <code>char</code> data type (and therefore the value that a
  * <code>Character</code> object encapsulates) are based on the
  * original Unicode specification, which defined characters as
- * fixed-width 16-bit entities. The Unicode standard has since been
+ * fixed-width 16-bit entities. The Unicode Standard has since been
  * changed to allow for characters whose representation requires more
  * than 16 bits.  The range of legal <em>code point</em>s is now
  * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>.
  * (Refer to the <a
  * href="http://www.unicode.org/reports/tr27/#notation";><i>
  * definition</i></a> of the U+<i>n</i> notation in the Unicode
- * standard.)
+ * Standard.)
  *
  * <p><a name="BMP">The set of characters from U+0000 to U+FFFF is
  * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>.
@@ -5198,13 +5198,14 @@
     }
 
     /**
-     * Determines if the specified character is a lowercase character.
+     * Determines if the specified character (Java <code>char</code>)
+     * char is a lowercase letter.
      * <p>
-     * A character is lowercase if its general category type, provided
-     * by <code>Character.getType(ch)</code>, is
+     * A character is a lowercase letter (GC=Ll) if its general category 
+     * type, provided by <code>Character.getType(ch)</code>, is
      * <code>LOWERCASE_LETTER</code>.
      * <p>
-     * The following are examples of lowercase characters:
+     * The following are examples of lowercase letters:
      * <p><blockquote><pre>
      * a b c d e f g h i j k l m n o p q r s t u v w x y z
      * '&#92;u00DF' '&#92;u00E0' '&#92;u00E1' '&#92;u00E2' '&#92;u00E3' 
'&#92;u00E4' '&#92;u00E5' '&#92;u00E6'
@@ -5212,7 +5213,14 @@
      * '&#92;u00EF' '&#92;u00F0' '&#92;u00F1' '&#92;u00F2' '&#92;u00F3' 
'&#92;u00F4' '&#92;u00F5' '&#92;u00F6'
      * '&#92;u00F8' '&#92;u00F9' '&#92;u00FA' '&#92;u00FB' '&#92;u00FC' 
'&#92;u00FD' '&#92;u00FE' '&#92;u00FF'
      * </pre></blockquote>
-     * <p> Many other Unicode characters are lowercase too.
+     * <p> Many other Unicode characters are lowercase, too, including many
+     * modifier letters and subscripts (which are GC=Lm, not GC=Ll), some 
+     * Roman numerals (GC=Nl), some circled letters (GC=So in the
+     * Block=Enclosed_Alphanumerics), and even one combining character, 
+     * U+02E4 COMBINING GREEK YPOGEGRAMMENI (GC=Mn).  However, because 
+     * those lowercase characters are not lowercase <i>letters</i>, this 
+     * method will not identify them as lowercase.  There are 159 such code 
+     * points as of Unicode 6.0.<p>
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
@@ -5232,14 +5240,14 @@
     }
 
     /**
-     * Determines if the specified character (Unicode code point) is a
-     * lowercase character.
+     * Determines if the specified Unicode code point is a
+     * lowercase letter.
      * <p>
-     * A character is lowercase if its general category type, provided
-     * by {@link Character#getType getType(codePoint)}, is
+     * A character is a lowercase letter (GC=Ll) if its general category type, 
+     * provided by {@link Character#getType getType(codePoint)}, is
      * <code>LOWERCASE_LETTER</code>.
      * <p>
-     * The following are examples of lowercase characters:
+     * The following are examples of lowercase letters:
      * <p><blockquote><pre>
      * a b c d e f g h i j k l m n o p q r s t u v w x y z
      * '&#92;u00DF' '&#92;u00E0' '&#92;u00E1' '&#92;u00E2' '&#92;u00E3' 
'&#92;u00E4' '&#92;u00E5' '&#92;u00E6'
@@ -5247,7 +5255,14 @@
      * '&#92;u00EF' '&#92;u00F0' '&#92;u00F1' '&#92;u00F2' '&#92;u00F3' 
'&#92;u00F4' '&#92;u00F5' '&#92;u00F6'
      * '&#92;u00F8' '&#92;u00F9' '&#92;u00FA' '&#92;u00FB' '&#92;u00FC' 
'&#92;u00FD' '&#92;u00FE' '&#92;u00FF'
      * </pre></blockquote>
-     * <p> Many other Unicode characters are lowercase too.
+     * <p> Many other Unicode characters are lowercase, too, including many
+     * modifier letters and subscripts (which are GC=Lm, not GC=Ll), some 
+     * Roman numerals (GC=Nl), some circled letters (GC=So in the
+     * Block=Enclosed_Alphanumerics), and even one combining character, 
+     * U+02E4 COMBINING GREEK YPOGEGRAMMENI (GC=Mn).  However, because 
+     * those lowercase characters are not lowercase <i>letters</i>, this 
+     * method will not identify them as lowercase.  There are 159 such code 
+     * points as of Unicode 6.0.<p>
      *
      * @param   codePoint the character (Unicode code point) to be tested.
      * @return  <code>true</code> if the character is lowercase;
@@ -5263,12 +5278,12 @@
     }
 
     /**
-     * Determines if the specified character is an uppercase character.
+     * Determines if the specified character (Java <code>char</code>) is an 
uppercase letter.
      * <p>
-     * A character is uppercase if its general category type, provided by
+     * A character is an uppercase letter (GC=Lu) if its general category 
type, provided by
      * <code>Character.getType(ch)</code>, is <code>UPPERCASE_LETTER</code>.
      * <p>
-     * The following are examples of uppercase characters:
+     * The following are examples of uppercase letters:
      * <p><blockquote><pre>
      * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
      * '&#92;u00C0' '&#92;u00C1' '&#92;u00C2' '&#92;u00C3' '&#92;u00C4' 
'&#92;u00C5' '&#92;u00C6' '&#92;u00C7'
@@ -5276,7 +5291,12 @@
      * '&#92;u00D0' '&#92;u00D1' '&#92;u00D2' '&#92;u00D3' '&#92;u00D4' 
'&#92;u00D5' '&#92;u00D6' '&#92;u00D8'
      * '&#92;u00D9' '&#92;u00DA' '&#92;u00DB' '&#92;u00DC' '&#92;u00DD' 
'&#92;u00DE'
      * </pre></blockquote>
-     * <p> Many other Unicode characters are uppercase too.<p>
+     * <p>Many other Unicode characters are uppercase, too, including some
+     * Roman numerals (which are GC=Nl, not GC=Lu) and some circled
+     * letters (GC=So in the Block=Enclosed_Alphanumerics). However,
+     * because those uppercase characters are not uppercase
+     * <i>letters</i>, this method will not identify them as being
+     * uppercase. There are 42 such characters as of Unicode 6.0.<p>
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
@@ -5297,12 +5317,12 @@
     }
 
     /**
-     * Determines if the specified character (Unicode code point) is an 
uppercase character.
+     * Determines if the specified Unicode code point is an uppercase letter.
      * <p>
-     * A character is uppercase if its general category type, provided by
+     * A character is an uppercase letter (GC=Lu) if its general category 
type, provided by
      * {@link Character#getType(int) getType(codePoint)}, is 
<code>UPPERCASE_LETTER</code>.
      * <p>
-     * The following are examples of uppercase characters:
+     * The following are examples of uppercase letters:
      * <p><blockquote><pre>
      * A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
      * '&#92;u00C0' '&#92;u00C1' '&#92;u00C2' '&#92;u00C3' '&#92;u00C4' 
'&#92;u00C5' '&#92;u00C6' '&#92;u00C7'
@@ -5310,7 +5330,12 @@
      * '&#92;u00D0' '&#92;u00D1' '&#92;u00D2' '&#92;u00D3' '&#92;u00D4' 
'&#92;u00D5' '&#92;u00D6' '&#92;u00D8'
      * '&#92;u00D9' '&#92;u00DA' '&#92;u00DB' '&#92;u00DC' '&#92;u00DD' 
'&#92;u00DE'
      * </pre></blockquote>
-     * <p> Many other Unicode characters are uppercase too.<p>
+     * <p>Many other Unicode characters are uppercase, too, including some
+     * Roman numerals (which are GC=Nl, not GC=Lu) and some circled
+     * letters (GC=So in the Block=Enclosed_Alphanumerics). However,
+     * because those uppercase characters are not uppercase
+     * <i>letters</i>, this method will not identify them as being
+     * uppercase. There are 42 such characters as of Unicode 6.0.<p>
      *
      * @param   codePoint the character (Unicode code point) to be tested.
      * @return  <code>true</code> if the character is uppercase;
@@ -5326,14 +5351,14 @@
     }
 
     /**
-     * Determines if the specified character is a titlecase character.
+     * Determines if the specified character (Java <code>char</code>) is a 
titlecase letter.
      * <p>
-     * A character is a titlecase character if its general
+     * A character is a titlecase letter (GC=Lt) if its general
      * category type, provided by <code>Character.getType(ch)</code>,
      * is <code>TITLECASE_LETTER</code>.
      * <p>
-     * Some characters look like pairs of Latin letters. For example, there
-     * is an uppercase letter that looks like "LJ" and has a corresponding
+     * Some characters look like pairs of letters. For example, there
+     * is an uppercase Latin letter that looks like "LJ" and has a 
corresponding
      * lowercase letter that looks like "lj". A third form, which looks like 
"Lj",
      * is the appropriate form to use when rendering a word in lowercase
      * with initial capitals, as for a book title.
@@ -5345,8 +5370,12 @@
      * <li><code>LATIN CAPITAL LETTER L WITH SMALL LETTER J</code>
      * <li><code>LATIN CAPITAL LETTER N WITH SMALL LETTER J</code>
      * <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</code>
+     * <li><code>GREEK CAPITAL LETTER ALPHA WITH PSILI AND 
PROSGEGRAMMENI</code>
+     * <li><code>GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI</code>
      * </ul>
-     * <p> Many other Unicode characters are titlecase too.<p>
+     * <p> Many other Unicode characters are titlecase letters, too.
+     * As of Unicode 6.0, there are 31 titlecase characters, all of 
+     * which are titlecase letters.<p>
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
@@ -5367,14 +5396,14 @@
     }
 
     /**
-     * Determines if the specified character (Unicode code point) is a 
titlecase character.
+     * Determines if the specified Unicode code point is a titlecase letter.
      * <p>
-     * A character is a titlecase character if its general
-     * category type, provided by {@link Character#getType(int) 
getType(codePoint)},
+     * A character is a titlecase letter (GC=Lt) if its general
+     * category type, provided by <code>Character.getType(ch)</code>,
      * is <code>TITLECASE_LETTER</code>.
      * <p>
-     * Some characters look like pairs of Latin letters. For example, there
-     * is an uppercase letter that looks like "LJ" and has a corresponding
+     * Some characters look like pairs of letters. For example, there
+     * is an uppercase Latin letter that looks like "LJ" and has a 
corresponding
      * lowercase letter that looks like "lj". A third form, which looks like 
"Lj",
      * is the appropriate form to use when rendering a word in lowercase
      * with initial capitals, as for a book title.
@@ -5386,8 +5415,12 @@
      * <li><code>LATIN CAPITAL LETTER L WITH SMALL LETTER J</code>
      * <li><code>LATIN CAPITAL LETTER N WITH SMALL LETTER J</code>
      * <li><code>LATIN CAPITAL LETTER D WITH SMALL LETTER Z</code>
+     * <li><code>GREEK CAPITAL LETTER ALPHA WITH PSILI AND 
PROSGEGRAMMENI</code>
+     * <li><code>GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI</code>
      * </ul>
-     * <p> Many other Unicode characters are titlecase too.<p>
+     * <p> Many other Unicode characters are titlecase letters, too.
+     * As of Unicode 6.0, there are 31 titlecase characters, all of 
+     * which are titlecase letters.<p>
      *
      * @param   codePoint the character (Unicode code point) to be tested.
      * @return  <code>true</code> if the character is titlecase;
@@ -5405,7 +5438,7 @@
     /**
      * Determines if the specified character is a digit.
      * <p>
-     * A character is a digit if its general category type, provided
+     * A character is a digit (GC=Nd) if its general category type, provided
      * by <code>Character.getType(ch)</code>, is
      * <code>DECIMAL_DIGIT_NUMBER</code>.
      * <p>
@@ -5444,7 +5477,7 @@
     /**
      * Determines if the specified character (Unicode code point) is a digit.
      * <p>
-     * A character is a digit if its general category type, provided
+     * A character is a digit (GC=Nd) if its general category type, provided
      * by {@link Character#getType(int) getType(codePoint)}, is
      * <code>DECIMAL_DIGIT_NUMBER</code>.
      * <p>
@@ -5529,7 +5562,7 @@
     }
 
     /**
-     * Determines if the specified character is a letter.
+     * Determines if the specified character (Java <code>char</code>) is a 
letter.
      * <p>
      * A character is considered to be a letter if its general
      * category type, provided by <code>Character.getType(ch)</code>,
@@ -5542,13 +5575,20 @@
      * <li> <code>OTHER_LETTER</code>
      * </ul>
      *
-     * Not all letters have case. Many characters are
-     * letters but are neither uppercase nor lowercase nor titlecase.
+     * Not all letters have case, and not all cased characters are
+     * letters. Many characters are letters but are neither uppercase
+     * (GC=Lu) nor lowercase (GC=Ll) nor titlecase (GC=Lt).  Letters
+     * without case are either Modifier_Letters (GC=Lm) or Other_Letters
+     * (GC=Lo), but some modifier letters I<do> have case. Similarly, not all
+     * characters with case are letters, such as the Roman numerals, which
+     * are Letter_Numbers (GC=Nl) and the circled letters, which are
+     * Other_Symbols (GC=So).  There are 201 cased characters as of
+     * Unicode 6.0 that are neither uppercase, lowercase, nor titlecase.
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
      * all Unicode characters, including supplementary characters, use
-     * the {@link #isLetter(int)} method.
+     * the {@link #isLetter(int)} method.<p>
      *
      * @param   ch   the character to be tested.
      * @return  <code>true</code> if the character is a letter;
@@ -5581,8 +5621,15 @@
      * <li> <code>OTHER_LETTER</code>
      * </ul>
      *
-     * Not all letters have case. Many characters are
-     * letters but are neither uppercase nor lowercase nor titlecase.
+     * Not all letters have case, and not all cased characters are
+     * letters. Many characters are letters but are neither uppercase
+     * (GC=Lu) nor lowercase (GC=Ll) nor titlecase (GC=Lt).  Letters
+     * without case are either Modifier_Letters (GC=Lm) or Other_Letters
+     * (GC=Lo), but some modifier letters I<do> have case. Similarly, not all
+     * characters with case are letters, such as the Roman numerals, which
+     * are Letter_Numbers (GC=Nl) and the circled letters, which are
+     * Other_Symbols (GC=So).  There are 201 cased characters as of
+     * Unicode 6.0 that are neither uppercase, lowercase, nor titlecase.<p>
      *
      * @param   codePoint the character (Unicode code point) to be tested.
      * @return  <code>true</code> if the character is a letter;
@@ -5606,7 +5653,7 @@
     }
 
     /**
-     * Determines if the specified character is a letter or digit.
+     * Determines if the specified character (Java <code>char</code>) is a 
letter or digit.
      * <p>
      * A character is considered to be a letter or digit if either
      * <code>Character.isLetter(char ch)</code> or
@@ -5661,14 +5708,14 @@
     }
 
     /**
-     * Determines if the specified character is permissible as the first
+     * Determines if the specified character (Java <code>char</code>) is 
permissible as the first
      * character in a Java identifier.
      * <p>
      * A character may start a Java identifier if and only if
      * one of the following is true:
      * <ul>
      * <li> {@link #isLetter(char) isLetter(ch)} returns <code>true</code>
-     * <li> {@link #getType(char) getType(ch)} returns 
<code>LETTER_NUMBER</code>
+     * <li> {@link #getType(char) getType(ch)} returns 
<code>LETTER_NUMBER</code> (GC=Nl)
      * <li> ch is a currency symbol (such as "$")
      * <li> ch is a connecting punctuation character (such as "_").
      * </ul>
@@ -5691,7 +5738,7 @@
     }
 
     /**
-     * Determines if the specified character may be part of a Java
+     * Determines if the specified character (Java <code>char</code>) may be 
part of a Java
      * identifier as other than the first character.
      * <p>
      * A character may be part of a Java identifier if and only if any
@@ -5727,7 +5774,7 @@
     }
 
     /**
-     * Determines if the specified character is
+     * Determines if the specified character (Java <code>char</code>) is
      * permissible as the first character in a Java identifier.
      * <p>
      * A character may start a Java identifier if and only if
@@ -5787,7 +5834,7 @@
     }
 
     /**
-     * Determines if the specified character may be part of a Java
+     * Determines if the specified character (Java <code>char</code>) may be 
part of a Java
      * identifier as other than the first character.
      * <p>
      * A character may be part of a Java identifier if any of the following
@@ -5857,7 +5904,7 @@
     }
 
     /**
-     * Determines if the specified character is permissible as the
+     * Determines if the specified character (Java <code>char</code>) is 
permissible as the
      * first character in a Unicode identifier.
      * <p>
      * A character may start a Unicode identifier if and only if
@@ -5910,7 +5957,7 @@
     }
 
     /**
-     * Determines if the specified character may be part of a Unicode
+     * Determines if the specified character (Java <code>char</code>) may be 
part of a Unicode
      * identifier as other than the first character.
      * <p>
      * A character may be part of a Unicode identifier if and only if
@@ -5974,7 +6021,7 @@
     }
 
     /**
-     * Determines if the specified character should be regarded as
+     * Determines if the specified character (Java <code>char</code>) should 
be regarded as
      * an ignorable character in a Java identifier or a Unicode identifier.
      * <p>
      * The following Unicode characters are ignorable in a Java identifier
@@ -6039,20 +6086,34 @@
     }
 
     /**
-     * Converts the character argument to lowercase using case
-     * mapping information from the UnicodeData file.
+     * Converts the character (Java <code>char</code>) argument to lowercase 
+     * using case mapping information from the UnicodeData file.
      * <p>
      * Note that
      * <code>Character.isLowerCase(Character.toLowerCase(ch))</code>
-     * does not always return <code>true</code> for some ranges of
-     * characters, particularly those that are symbols or ideographs.
+     * does not return <code>true</code> for some ranges of
+     * lowercase characters, particularly those that are symbols or ideographs,
+     * or lowercase modifier letters.
+     *
+     * <p><b>Note:</b> This method cannot handle characters whose lowercase 
mapping
+     * according to the SpecialCasing file in the Unicode specification
+     * returns more than one character. As of Unicode 6.0, there is only
+     * one such code point (if locales are not considered):
+     *
+     * <ul>
+     * <li><code> LATIN CAPITAL LETTER I WITH DOT ABOVE</code></li>
+     * </ul>
      *
      * <p>In general, {@link String#toLowerCase()} should be used to map
      * characters to lowercase. <code>String</code> case mapping methods
      * have several benefits over <code>Character</code> case mapping methods.
      * <code>String</code> case mapping methods can perform locale-sensitive
      * mappings, context-sensitive mappings, and 1:M character mappings, 
whereas
-     * the <code>Character</code> case mapping methods cannot.
+     * the <code>Character</code> case mapping methods cannot.  In Unicode 
terminology,
+     * <code>Character</code> case mappings are <i>simple case mappings</i> 
(because they
+     * can only map to a single character), while <code>String</code> case 
mappings
+     * are <i>full case mappings</i>, because they can map to multiple 
characters,
+     * as defined by the SpecialCasing file in the Unicode specification
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
@@ -6071,20 +6132,33 @@
 
     /**
      * Converts the character (Unicode code point) argument to
-     * lowercase using case mapping information from the UnicodeData
+     * lowercase using simple case mapping information from the UnicodeData
      * file.
      *
      * <p> Note that
      * <code>Character.isLowerCase(Character.toLowerCase(codePoint))</code>
-     * does not always return <code>true</code> for some ranges of
-     * characters, particularly those that are symbols or ideographs.
+     * does not return <code>true</code> for some ranges of
+     * lowercase characters, particularly those that are symbols or ideographs,
+     * or lowercase modifier letters.
+     *
+     * <p><b>Note:</b> This method cannot handle characters whose lowercase 
mapping
+     * according to the SpecialCasing file returns more than one character.
+     * As of Unicode 6.0, there is only one such code point,
+     * if locales are not considered:
+     * <ul>
+     * <li><code> LATIN CAPITAL LETTER I WITH DOT ABOVE</code></li>
+     * </ul>
      *
      * <p>In general, {@link String#toLowerCase()} should be used to map
      * characters to lowercase. <code>String</code> case mapping methods
      * have several benefits over <code>Character</code> case mapping methods.
      * <code>String</code> case mapping methods can perform locale-sensitive
      * mappings, context-sensitive mappings, and 1:M character mappings, 
whereas
-     * the <code>Character</code> case mapping methods cannot.
+     * the <code>Character</code> case mapping methods cannot.  In Unicode 
terminology,
+     * <code>Character</code> case mappings are <i>simple case mappings</i> 
(because they
+     * can only map to a single character), while <code>String</code> case 
mappings
+     * are <i>full case mappings</i>, because they can map to multiple 
characters,
+     * as defined by the SpecialCasing file in the Unicode specification
      *
      * @param   codePoint   the character (Unicode code point) to be converted.
      * @return  the lowercase equivalent of the character (Unicode code
@@ -6099,20 +6173,39 @@
     }
 
     /**
-     * Converts the character argument to uppercase using case mapping
-     * information from the UnicodeData file.
+     * Converts the character (Java <code>char</code>) argument to 
+     * uppercase using simple case mapping information from the 
+     * UnicodeData file.
+     *
      * <p>
      * Note that
      * <code>Character.isUpperCase(Character.toUpperCase(ch))</code>
-     * does not always return <code>true</code> for some ranges of
-     * characters, particularly those that are symbols or ideographs.
+     * does not return <code>true</code> for some ranges of
+     * uppercase characters, particularly those that are symbols or ideographs.
+     *
+     * <p><b>Note:</b> This method cannot handle characters whose uppercase 
mapping
+     * according to the SpecialCasing file in the Unicode specification
+     * than one character.  Examples of such code points include:
+     * <ul>
+     * <li><code>LATIN SMALL LETTER SHARP S</code></li>
+     * <li><code>LATIN SMALL LETTER J WITH CARON</code></li>
+     * <li><code>LATIN SMALL LETTER A WITH RIGHT HALF RING</code></li>
+     * <li><code>GREEK SMALL LETTER UPSILON WITH PSILI</code></li>
+     * <li><code>LATIN SMALL LIGATURE FI</code></li>
+     * <li><code>LATIN SMALL LIGATURE ST</code></li>
+     * </ul>
+     * <p>As of Unicode 6.0, there are 102 such code points.
      *
      * <p>In general, {@link String#toUpperCase()} should be used to map
      * characters to uppercase. <code>String</code> case mapping methods
      * have several benefits over <code>Character</code> case mapping methods.
      * <code>String</code> case mapping methods can perform locale-sensitive
      * mappings, context-sensitive mappings, and 1:M character mappings, 
whereas
-     * the <code>Character</code> case mapping methods cannot.
+     * the <code>Character</code> case mapping methods cannot.  In Unicode 
terminology,
+     * <code>Character</code> case mappings are <i>simple case mappings</i> 
because they
+     * can only map to a single character, while <code>String</code> case 
mappings
+     * are <i>full case mappings</i> because they can map to multiple 
characters,
+     * as defined by the SpecialCasing file in the Unicode specification
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
@@ -6131,20 +6224,37 @@
 
     /**
      * Converts the character (Unicode code point) argument to
-     * uppercase using case mapping information from the UnicodeData
+     * uppercase using simple case mapping information from the UnicodeData
      * file.
      *
      * <p>Note that
      * <code>Character.isUpperCase(Character.toUpperCase(codePoint))</code>
-     * does not always return <code>true</code> for some ranges of
-     * characters, particularly those that are symbols or ideographs.
+     * does not return <code>true</code> for some ranges of
+     * uppercase characters, particularly those that are symbols or ideographs.
+
+     * <p><b>Note:</b> This method cannot handle characters whose uppercase 
mapping
+     * according to the SpecialCasing file returns more than one character.
+     * Examples of such code points include:
+     * <ul>
+     * <li><code>LATIN SMALL LETTER SHARP S</code></li>
+     * <li><code>LATIN SMALL LETTER J WITH CARON</code></li>
+     * <li><code>LATIN SMALL LETTER A WITH RIGHT HALF RING</code></li>
+     * <li><code>GREEK SMALL LETTER UPSILON WITH PSILI</code></li>
+     * <li><code>LATIN SMALL LIGATURE FI</code></li>
+     * <li><code>LATIN SMALL LIGATURE ST</code></li>
+     * </ul>
+     * <p>As of Unicode 6.0, there are 102 such code points.
      *
      * <p>In general, {@link String#toUpperCase()} should be used to map
      * characters to uppercase. <code>String</code> case mapping methods
      * have several benefits over <code>Character</code> case mapping methods.
      * <code>String</code> case mapping methods can perform locale-sensitive
      * mappings, context-sensitive mappings, and 1:M character mappings, 
whereas
-     * the <code>Character</code> case mapping methods cannot.
+     * the <code>Character</code> case mapping methods cannot.  In Unicode 
terminology,
+     * <code>Character</code> case mappings are <i>simple case mappings</i> 
because they
+     * can only map to a single character, while <code>String</code> case 
mappings
+     * are <i>full case mappings</i> because they can map to multiple 
characters
+     * as defined by the SpecialCasing file in the Unicode specification.
      *
      * @param   codePoint   the character (Unicode code point) to be converted.
      * @return  the uppercase equivalent of the character, if any;
@@ -6159,25 +6269,39 @@
     }
 
     /**
-     * Converts the character argument to titlecase using case mapping
-     * information from the UnicodeData file. If a character has no
-     * explicit titlecase mapping and is not itself a titlecase char
-     * according to UnicodeData, then the uppercase mapping is
-     * returned as an equivalent titlecase mapping. If the
-     * <code>char</code> argument is already a titlecase
-     * <code>char</code>, the same <code>char</code> value will be
-     * returned.
+     * Converts the character (Java <code>char</code>) argument to
+     * titlecase using simple case mapping information from the
+     * UnicodeData file. If a character has no explicit titlecase
+     * mapping and is not itself a titlecase char according to
+     * UnicodeData, then the simple uppercase mapping is returned as an
+     * equivalent titlecase mapping. Simple mapping means that only single 
+     * character returns are possible, and any full case mapping from
+     * the SpecialCasing file in the Unicode specification is
+     * disregarded. If the <code>char</code> argument is already a
+     * titlecase character, that same value will be returned.
+     *
      * <p>
      * Note that
      * <code>Character.isTitleCase(Character.toTitleCase(ch))</code>
-     * does not always return <code>true</code> for some ranges of
-     * characters.
+     * may not always return <code>true</code> for some ranges of
+     * titlecase characters.
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
      * all Unicode characters, including supplementary characters, use
      * the {@link #toTitleCase(int)} method.
      *
+     * <p><b>Note:</b> This method cannot handle characters whose titlecase 
+     * mapping according to the SpecialCasing file returns more than one 
character.
+     * Examples of such code points include:
+     * <ul>
+     * <li><code>LATIN SMALL LETTER SHARP S</code></li>
+     * <li><code>LATIN SMALL LETTER J WITH CARON</code></li>
+     * <li><code>LATIN SMALL LIGATURE FI</code></li>
+     * <li><code>LATIN SMALL LIGATURE ST</code></li>
+     * </ul>
+     * <p>As of Unicode 6.0, there are 48 such code points.
+     *
      * @param   ch   the character to be converted.
      * @return  the titlecase equivalent of the character, if any;
      *          otherwise, the character itself.
@@ -6191,20 +6315,34 @@
     }
 
     /**
-     * Converts the character (Unicode code point) argument to titlecase using 
case mapping
-     * information from the UnicodeData file. If a character has no
-     * explicit titlecase mapping and is not itself a titlecase char
-     * according to UnicodeData, then the uppercase mapping is
-     * returned as an equivalent titlecase mapping. If the
-     * character argument is already a titlecase
-     * character, the same character value will be
-     * returned.
+     * Converts the character (Unicode code point) argument to titlecase
+     * using simple case mapping information from the UnicodeData file. If a
+     * character has no explicit titlecase mapping and is not itself a
+     * titlecase char according to UnicodeData, then the simple uppercase
+     * mapping is returned as an equivalent titlecase mapping. Simple
+     * mapping means that only single-character returns are possible,
+     * and any full case mapping from the SpecialCasing file in the
+     * Unicode specification is disregarded. If the <code>int</code>
+     * argument is already a titlecase character, that same value will
+     * be returned.
+     *
      *
      * <p>Note that
      * <code>Character.isTitleCase(Character.toTitleCase(codePoint))</code>
      * does not always return <code>true</code> for some ranges of
      * characters.
      *
+     * <p><b>Note:</b> This method cannot handle characters whose titlecase 
+     * mapping according to the SpecialCasing file returns more than one 
character.  
+     * Examples of such code points include:
+     * <ul>
+     * <li><code>LATIN SMALL LETTER SHARP S</code></li>
+     * <li><code>LATIN SMALL LETTER J WITH CARON</code></li>
+     * <li><code>LATIN SMALL LIGATURE FI</code></li>
+     * <li><code>LATIN SMALL LIGATURE ST</code></li>
+     * </ul>
+     * <p>As of Unicode 6.0, there are 48 such code points.
+     *
      * @param   codePoint   the character (Unicode code point) to be converted.
      * @return  the titlecase equivalent of the character, if any;
      *          otherwise, the character itself.
@@ -6306,7 +6444,7 @@
      * The letters A-Z in their uppercase (<code>'&#92;u0041'</code> through
      * <code>'&#92;u005A'</code>), lowercase
      * (<code>'&#92;u0061'</code> through <code>'&#92;u007A'</code>), and
-     * full width variant (<code>'&#92;uFF21'</code> through
+     * fullwidth variant (<code>'&#92;uFF21'</code> through
      * <code>'&#92;uFF3A'</code> and <code>'&#92;uFF41'</code> through
      * <code>'&#92;uFF5A'</code>) forms have numeric values from 10
      * through 35. This is independent of the Unicode specification,
@@ -6344,7 +6482,7 @@
      * The letters A-Z in their uppercase (<code>'&#92;u0041'</code> through
      * <code>'&#92;u005A'</code>), lowercase
      * (<code>'&#92;u0061'</code> through <code>'&#92;u007A'</code>), and
-     * full width variant (<code>'&#92;uFF21'</code> through
+     * fullwidth variant (<code>'&#92;uFF21'</code> through
      * <code>'&#92;uFF3A'</code> and <code>'&#92;uFF41'</code> through
      * <code>'&#92;uFF5A'</code>) forms have numeric values from 10
      * through 35. This is independent of the Unicode specification,
@@ -6404,11 +6542,15 @@
 
 
     /**
-     * Determines if the specified character is a Unicode space character.
-     * A character is considered to be a space character if and only if
-     * it is specified to be a space character by the Unicode standard. This
-     * method returns true if the character's general category type is any of
-     * the following:
+     * Determines if the specified character (Java <code>char</code>) is
+     * a Unicode space separator (GC=Zs), line separator (GC=Zl), or
+     * paragraph separator (GC=Zp).  This is I<not> equivalent to the
+     * Unicode White_Space property, which also includes six control
+     * characters. A character is considered to be a space character if
+     * and only if it is specified to be a space, line, or paragraph
+     * separator by the Unicode Standard. This method returns true if
+     * the character's general category type is any of the following:
+
      * <ul>
      * <li> <code>SPACE_SEPARATOR</code>
      * <li> <code>LINE_SEPARATOR</code>
@@ -6431,10 +6573,13 @@
     }
 
     /**
-     * Determines if the specified character (Unicode code point) is a
-     * Unicode space character.  A character is considered to be a
-     * space character if and only if it is specified to be a space
-     * character by the Unicode standard. This method returns true if
+     * Determines if the specified character (Unicode code point) is
+     * a Unicode space separator (GC=Zs), line separator (GC=Zl), or
+     * paragraph separator (GC=Zp).  This is I<not> equivalent to the
+     * Unicode White_Space property, which also includes six control
+     * characters. A character is considered to be a space character if
+     * and only if it is specified to be a space, line, or paragraph
+     * separator by the Unicode Standard. This method returns true if
      * the character's general category type is any of the following:
      *
      * <ul>
@@ -6475,6 +6620,8 @@
      * <li> It is <code>'&#92;u001E'</code>, U+001E RECORD SEPARATOR.
      * <li> It is <code>'&#92;u001F'</code>, U+001F UNIT SEPARATOR.
      * </ul>
+     * <p><b>Note:</b> The Unicode White_Space property is not 
+     * the same as Java whitespace.
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
@@ -6493,11 +6640,11 @@
 
     /**
      * Determines if the specified character (Unicode code point) is
-     * white space according to Java.  A character is a Java
-     * whitespace character if and only if it satisfies one of the
-     * following criteria:
+     * white space according to Java, not according to Unicode. A
+     * character is a Java whitespace character if and only if it
+     * satisfies one of the following criteria:
      * <ul>
-     * <li> It is a Unicode space character ({@link #SPACE_SEPARATOR},
+     * <li> It is a Unicode space separator (character{@link #SPACE_SEPARATOR},
      *      {@link #LINE_SEPARATOR}, or {@link #PARAGRAPH_SEPARATOR})
      *      but is not also a non-breaking space (<code>'&#92;u00A0'</code>,
      *      <code>'&#92;u2007'</code>, <code>'&#92;u202F'</code>).
@@ -6511,6 +6658,8 @@
      * <li> It is <code>'&#92;u001E'</code>, U+001E RECORD SEPARATOR.
      * <li> It is <code>'&#92;u001F'</code>, U+001F UNIT SEPARATOR.
      * </ul>
+     * <p><b>Note:</b> The Unicode White_Space property is not 
+     * the same as Java whitespace.
      * <p>
      *
      * @param   codePoint the character (Unicode code point) to be tested.
@@ -6524,12 +6673,14 @@
     }
 
     /**
-     * Determines if the specified character is an ISO control
+     * Determines if the specified character (Java <code>char</code>) is an 
ISO control
      * character.  A character is considered to be an ISO control
      * character if its code is in the range <code>'&#92;u0000'</code>
      * through <code>'&#92;u001F'</code> or in the range
      * <code>'&#92;u007F'</code> through <code>'&#92;u009F'</code>.
      *
+     * <p><b>Note:</b> This is identical to the Unicode Control property 
(GC=Cc).
+     *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
      * all Unicode characters, including supplementary characters, use
@@ -6548,12 +6699,14 @@
     }
 
     /**
-     * Determines if the referenced character (Unicode code point) is an ISO 
control
+     * Determines if the specified character (Unicode code point) is an ISO 
control
      * character.  A character is considered to be an ISO control
      * character if its code is in the range <code>'&#92;u0000'</code>
      * through <code>'&#92;u001F'</code> or in the range
      * <code>'&#92;u007F'</code> through <code>'&#92;u009F'</code>.
      *
+     * <p><b>Note:</b> This is identical to the Unicode Control property 
(GC=Cc).
+     *
      * @param   codePoint the character (Unicode code point) to be tested.
      * @return  <code>true</code> if the character is an ISO control character;
      *          <code>false</code> otherwise.
@@ -6570,7 +6723,8 @@
     }
 
     /**
-     * Returns a value indicating a character's general category.
+     * Returns a value indicating the general category of the 
+     * specified character (Java <code>char</code>).
      *
      * <p><b>Note:</b> This method cannot handle <a
      * href="#supplementary"> supplementary characters</a>. To support
@@ -6617,7 +6771,8 @@
     }
 
     /**
-     * Returns a value indicating a character's general category.
+     * Returns a value indicating the general category of the 
+     * specified character (Unicode code point).
      *
      * @param   codePoint the character (Unicode code point) to be tested.
      * @return  a value of type <code>int</code> representing the
@@ -6697,7 +6852,7 @@
 
     /**
      * Returns the Unicode directionality property for the given
-     * character.  Character directionality is used to calculate the
+     * character (Java <code>char</code>).  Character directionality is used 
to calculate the
      * visual ordering of text. The directionality value of undefined
      * <code>char</code> values is <code>DIRECTIONALITY_UNDEFINED</code>.
      *
@@ -6740,7 +6895,7 @@
      * Returns the Unicode directionality property for the given
      * character (Unicode code point).  Character directionality is
      * used to calculate the visual ordering of text. The
-     * directionality value of undefined character is {@link
+     * directionality value of an undefined character is {@link
      * #DIRECTIONALITY_UNDEFINED}.
      *
      * @param   codePoint the character (Unicode code point) for which
@@ -6774,7 +6929,7 @@
     }
 
     /**
-     * Determines whether the character is mirrored according to the
+     * Determines whether the character (Java <code>char</code>) is mirrored 
according to the
      * Unicode specification.  Mirrored characters should have their
      * glyphs horizontally mirrored when displayed in text that is
      * right-to-left.  For example, <code>'&#92;u0028'</code> LEFT
@@ -6884,7 +7039,7 @@
      * @since 1.4
      */
     static char[] toUpperCaseCharArray(int codePoint) {
-        // As of Unicode 4.0, 1:M uppercasings only happen in the BMP.
+        // As of Unicode 6.0, 1:M uppercasings only happen in the BMP.
         assert isBmpCodePoint(codePoint);
         return CharacterData.of(codePoint).toUpperCaseCharArray(codePoint);
     }
@@ -6917,7 +7072,7 @@
      * Note: if the specified character is not assigned a name by
      * the <i>UnicodeData</i> file (part of the Unicode Character
      * Database maintained by the Unicode Consortium), the returned
-     * name is the same as the result of expression
+     * name is the same as the result of expression.
      *
      * <blockquote><code>
      *     Character.UnicodeBlock.of(codePoint)

DOC PATCH: java.lang.Character fixes (doc only, not code)

Reply via email to