Joe: This is a defensive approach that I believe has minimal cost. public static boolean isHighSurrogate(char ch) { // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1); }
> On Jul 15, 2020, at 3:32 PM, naoto.s...@oracle.com wrote: > > Hi Joe, > > Thank you for your review. > > On 7/15/20 10:57 AM, Joe Wang wrote: >> Hi Naoto, >> In StringUTF16.java, if one is isHighSurrogate and the other not, you may >> quickly return without going through the rest of the process, probably not >> significant as cp1 and cp2 and/or u1 and u2 won't be equal anyways. But it >> could skip a couple of toCodePoint/toUpperCase/toLowerCase calls. > > Yes, that is correct as of now, which is based on the assumption that case > mappings do not cross BMP and supplementary planes boundary. I could not find > any description where that's given or not. So I just took it to be safe. > > Naoto > >> -Joe >> On 7/15/20 9:00 AM, naoto.s...@oracle.com wrote: >>> Hello, >>> >>> Please review the fix to the following issues: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8248655 >>> https://bugs.openjdk.java.net/browse/JDK-8248434 >>> >>> The proposed changeset and its CSR are located at: >>> >>> https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8248664 >>> >>> A bug was filed against SimpleDateFormat (8248434) where case-insensitive >>> date format/parse failed in some of the new locales in JDK15. The root >>> cause was that case-insensitive String.regionMatches() method did not work >>> with supplementary characters. The problem is that the method's spec does >>> not expect case mappings of supplementary characters, possibly because it >>> was overlooked in the first place, JSR 204 - "Unicode Supplementary >>> Character support". Similar behavior is observed in other two >>> case-insensitive methods, i.e., compareToIgnoreCase() and >>> equalsIgnoreCase(). >>> >>> The fix is straightforward to compare strings by code point basis, instead >>> of code unit (16bit "char") basis. Technically this change will introduce a >>> backward incompatibility, but I believe it is an incompatibility to wrong >>> behavior, not true to the meaning of those methods' expectations. >>> >>> Naoto