Re: RFR - JDK-8202442 - String::unescape (Code Review)

Chris Hegarty Thu, 20 Sep 2018 03:47:41 -0700


> On 19 Sep 2018, at 23:21, Stuart Marks <[email protected]> wrote:
> 
> ...
> 
> 2979      * Each unicode escape in the form \unnnn is translated to the
> 2980      * unicode character whose code point is {@code 0xnnnn}. Care should 
> be
> 2981      * taken when using UTF-16 surrogate pairs to ensure that the high
> 2982      * surrogate (U+D800..U+DBFF) is immediately followed by a low 
> surrogate
> 2983      * (U+DC00..U+DFFF) otherwise a
> 2984      * {@link java.nio.charset.CharacterCodingException} may occur 
> during UTF-8
> 2985      * decoding.
> 
> 
> I know you're going to update this based on Naoto's comments, but I'd suggest 
> rethinking this section. The \unnnn construct is called a "Unicode escape" 
> per JLS 3.3, but how it's handled has little to do with Unicode. The nnnn 
> digits are simply translated into a 16-bit 'char' value. Any such value will 
> work, even if it's an invalid UTF-16 code unit (such as 0xFFF0) or an 
> unpaired surrogate.


I had a similar comment/question. CCE is a checked exception, and
since the method does not declare that it throws CCE, I took a look
at the implementation and came to the same conclusion as Stuart.

Additionally, why should non-character code points, like \uFFFE, be
translated? If it’s a non-character code point or a malformed surrogate
pair, would it not be better to just leave it as-is?

-Chris.

Re: RFR - JDK-8202442 - String::unescape (Code Review)

Reply via email to