Re: RFR: 8321283: Reuse StringLatin1::equals in regionMatches

Claes Redestad Wed, 26 Nov 2025 12:15:34 -0800

On Sun, 3 Dec 2023 05:58:27 GMT, Francesco Nigro <[email protected]> wrote:


>> src/java.base/share/classes/java/lang/String.java line 2186:
>> 
>>> 2184:         if (coder == otherCoder) {
>>> 2185:             if (ooffset == 0 && toffset == 0 && len == (tv.length >> 
>>> coder) && ov.length == tv.length) {
>>> 2186:                 return StringLatin1.equals(tv, ov);
>> 
>> A few questions on this:
>> 
>> - Should this also handle `UTF-16`?
>> - Could we save a comparison by `OR`ing offsets?
>> - The `equals` method will do the `tv` and `ov` length comparisons 
>> themselves, could we remove that check here or is the method call overhead 
>> too large?
>> - Do you happen to have results for `make test 
>> TEST="micro:java.lang.StringComparisons"`?
>> 
>> Suggestion:
>> 
>>             if ((ooffset | toffset) == 0 && len == (tv.length >> coder)) {
>>                 return coder == LATIN1
>>                   ? StringLatin1.equals(tv, ov)
>>                   : StringUTF16.equals(tv, ov);
>>             }
>
>> Could we save a comparison by ORing offsets?
> 
> I'm ok to use the or to save an additional offset comparison, good idea!
> 
>> Should this also handle UTF-16?
> 
> Currently String::equals doesn't use the UTF16 variant of the intrinsic, see 
> https://github.com/openjdk/jdk/blob/949846986f572dfb82912e7d71e7bfd37a90871e/src/java.base/share/classes/java/lang/String.java#L1879-L1880;
>  I think it's because checking byte per byte equality (from a semantic 
> perspective), still hold. I didn't checked what are the other usage for the 
> UTF16 version, but @cl4es could clarify it .
> 
>> The equals method will do the tv and ov length comparisons themselves, could 
>> we remove that check here or is the method call overhead too large?
> 
> I didn't remove it because region equality won't return (always) false, as 
> equals would do, in such case, but will compare the (If existing) minimal 
> common available length (till the specified len) between the two.
> 
>> Do you happen to have results for make test 
>> TEST="micro:java.lang.StringComparisons"?
> 
> Not yet, I have performed a microbenchmark myself to verify that with small 
> strings (below 16 bytes), equals was much better (both by enabling and 
> disabling the mismatch intrinsics), but I will do it, thanks!
> 
> I'm missing as well to verify the bytecode size of the method after my 
> changes, to make sure I didn't make it non-inlineable.

Yes, since the `coder` is checked first we know the bytes compared are either 
both latin1 or both UTF16 so there's no need for a UTF-16-aware char-by-char 
comparison here. In fact the `StringUTF16.equals` method looks completely 
unused.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16933#discussion_r1413697276

Re: RFR: 8321283: Reuse StringLatin1::equals in regionMatches

Reply via email to