On Tue, 6 May 2025 15:46:03 GMT, Magnus Ihse Bursie <i...@openjdk.org> wrote:
>> As part of the UTF-8 cleaning up done in >> [JDK-8301971](https://bugs.openjdk.org/browse/JDK-8301971), I looked at >> where and how we are using unicode sequences (`\uXXXX`). In several string >> literals, I think the unicode sequences still has merit, if they improve >> clarity or readability of the code. Some instances are more gray zone. But >> the places where it does not make sense at all are in comments, as part of >> fluid text comments. There they are just disruptive and not helpful at all. >> I tried to locate all such places (but I might have missed places, I did not >> do a proper lexical analysis to find comments) and fix them. >> >> 99% of this fix is to turn poor `Peter von der Ah\u00e9` into `Peter von der >> Ahé`. 😆 >> >> I checked some random samples on when this was introduced to see if there >> were some particular commit that mistreated the encoding, but they have been >> there since the original release of the open JDK source code. >> >> There are likely many more places where direct UTF-8 encoded characters is >> preferable to unicode sequences, but this seemed like a safe and trivial >> first start. > > Magnus Ihse Bursie has updated the pull request with a new target base due to > a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains two additional > commits since the last revision: > > - Merge branch 'master' into unicode-sequence-in-comments > - 8354968: Replace unicode sequences in comment text with UTF-8 characters src/java.base/share/classes/java/text/Collator.java line 141: > 139: * considered significant during comparison. The assignment of > strengths > 140: * to language features is locale dependent. A common example is for > 141: * different accented forms of the same base letter ("a" vs "ä") to > be Since this (and the other one in RuleBasedCollator) is in the explanation of text handling, I think keeping the original code point makes sense. So I'd have both UTF-8 string and its Unicode escape notation here. src/java.base/share/classes/java/text/RuleBasedCollator.java line 594: > 592: // a three-digit number, one digit for primary, one for > secondary, etc. > 593: // > 594: // String: A a B é Maybe "é (\u00e9, e-acute)"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24727#discussion_r2075933987 PR Review Comment: https://git.openjdk.org/jdk/pull/24727#discussion_r2075935811