On Tue, 16 Jun 2026 20:53:07 GMT, Naoto Sato <[email protected]> wrote:
>> vs:
>>
>>
>> for (int i = 0; i < lastIndex;) {
>> if (Character.isHighSurrogate(charAt(i++))) {
>> if (i >= lastIndex) break;
>> if (Character.isLowSurrogate(charAt(i))) {
>> n--;
>> i++;
>> }
>> }
>> }
>>
>>
>> - No `else`.
>> - No state variables.
>> - Branch prediction for the second and third `if` statements will succeed
>> 100% of the time for well-formed code unit sequences (normal strings).
>
> Does the suggested code have a bug? I think the code returns 2 for
> "\ud800\udc00" The loop breaks before the last low surrogate.
Concluded that `if (i >= lastIndex) break;` is harmful.
for (int i = 0; i < lastIndex;) {
if (Character.isHighSurrogate(charAt(i++))) {
if (Character.isLowSurrogate(charAt(i))) {
n--;
i++;
}
}
}
should work as intended. `i` after `++`ed won't leave the range `[0,
lastIndex]` (`[0, length)`) as long as `i < lestIndex` before `++`ed.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/26461#discussion_r3427172848