On Tue, 6 May 2025 17:15:34 GMT, Naoto Sato <na...@openjdk.org> wrote:

>> Magnus Ihse Bursie has updated the pull request with a new target base due 
>> to a merge or a rebase. The incremental webrev excludes the unrelated 
>> changes brought in by the merge/rebase. The pull request contains two 
>> additional commits since the last revision:
>> 
>>  - Merge branch 'master' into unicode-sequence-in-comments
>>  - 8354968: Replace unicode sequences in comment text with UTF-8 characters
>
> src/java.base/share/classes/java/text/Collator.java line 141:
> 
>> 139:      * considered significant during comparison. The assignment of 
>> strengths
>> 140:      * to language features is locale dependent. A common example is for
>> 141:      * different accented forms of the same base letter ("a" vs "ä") to 
>> be
> 
> Since this (and the other one in RuleBasedCollator) is in the explanation of 
> text handling, I think keeping the original code point makes sense. So I'd 
> have both UTF-8 string and its Unicode escape notation here.

I'm not sure what you mean by "both" here. Do you mean something along the 
lines of `é (\u00e9, e-acute)` as you suggested below? An additional 
complication here is that this is part of a javadoc block. I assumed (but must 
admit that I have not checked) that the `\u00E4` notation will be replaced with 
unicode characters by Javadoc in the generated html. If so, there should be no 
difference in the generated javadoc between the original `"\u00E4"` and my 
suggested patch `"ä"`. (There is a change for someone reading the code directly 
in Collator.java, though).

If I am right, and if you want the generated Javadoc to contain `\u00E4`, I 
assume you would need to escape the backslash. 

But then again, perhaps I am not correct and javadoc keeps the `\u00E4` as a 
literal. I'd have to check that.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24727#discussion_r2075997573

Reply via email to