> On 19 Sep 2018, at 23:21, Stuart Marks <stuart.ma...@oracle.com> wrote: > > ... > > 2979 * Each unicode escape in the form \unnnn is translated to the > 2980 * unicode character whose code point is {@code 0xnnnn}. Care should > be > 2981 * taken when using UTF-16 surrogate pairs to ensure that the high > 2982 * surrogate (U+D800..U+DBFF) is immediately followed by a low > surrogate > 2983 * (U+DC00..U+DFFF) otherwise a > 2984 * {@link java.nio.charset.CharacterCodingException} may occur > during UTF-8 > 2985 * decoding. > > > I know you're going to update this based on Naoto's comments, but I'd suggest > rethinking this section. The \unnnn construct is called a "Unicode escape" > per JLS 3.3, but how it's handled has little to do with Unicode. The nnnn > digits are simply translated into a 16-bit 'char' value. Any such value will > work, even if it's an invalid UTF-16 code unit (such as 0xFFF0) or an > unpaired surrogate.
I had a similar comment/question. CCE is a checked exception, and since the method does not declare that it throws CCE, I took a look at the implementation and came to the same conclusion as Stuart. Additionally, why should non-character code points, like \uFFFE, be translated? If it’s a non-character code point or a malformed surrogate pair, would it not be better to just leave it as-is? -Chris.