Re: RFR: 8354968: Replace unicode sequences in comment text with UTF-8 characters [v2]

Naoto Sato Tue, 06 May 2025 11:09:48 -0700

On Tue, 6 May 2025 17:59:18 GMT, Magnus Ihse Bursie <[email protected]> wrote:


>> src/java.base/share/classes/java/text/Collator.java line 141:
>> 
>>> 139:      * considered significant during comparison. The assignment of 
>>> strengths
>>> 140:      * to language features is locale dependent. A common example is 
>>> for
>>> 141:      * different accented forms of the same base letter ("a" vs "ä") 
>>> to be
>> 
>> Since this (and the other one in RuleBasedCollator) is in the explanation of 
>> text handling, I think keeping the original code point makes sense. So I'd 
>> have both UTF-8 string and its Unicode escape notation here.
>
> I'm not sure what you mean by "both" here. Do you mean something along the 
> lines of `é (\u00e9, e-acute)` as you suggested below? An additional 
> complication here is that this is part of a javadoc block. I assumed (but 
> must admit that I have not checked) that the `\u00E4` notation will be 
> replaced with unicode characters by Javadoc in the generated html. If so, 
> there should be no difference in the generated javadoc between the original 
> `"\u00E4"` and my suggested patch `"ä"`. (There is a change for someone 
> reading the code directly in Collator.java, though).
> 
> If I am right, and if you want the generated Javadoc to contain `\u00E4`, I 
> assume you would need to escape the backslash. 
> 
> But then again, perhaps I am not correct and javadoc keeps the `\u00E4` as a 
> literal. I'd have to check that.

Yes, I meant literally `\u00e9` or `\u00E4`, but I think it is better described 
as `U+00E9` emphasizing the code point. So in this case, I'd suggest

"a" vs "ä" (U+00E9)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24727#discussion_r2076008258

Re: RFR: 8354968: Replace unicode sequences in comment text with UTF-8 characters [v2]

Reply via email to