On Sun, 3 Dec 2023 05:58:27 GMT, Francesco Nigro <[email protected]> wrote:
>> src/java.base/share/classes/java/lang/String.java line 2186:
>>
>>> 2184: if (coder == otherCoder) {
>>> 2185: if (ooffset == 0 && toffset == 0 && len == (tv.length >>
>>> coder) && ov.length == tv.length) {
>>> 2186: return StringLatin1.equals(tv, ov);
>>
>> A few questions on this:
>>
>> - Should this also handle `UTF-16`?
>> - Could we save a comparison by `OR`ing offsets?
>> - The `equals` method will do the `tv` and `ov` length comparisons
>> themselves, could we remove that check here or is the method call overhead
>> too large?
>> - Do you happen to have results for `make test
>> TEST="micro:java.lang.StringComparisons"`?
>>
>> Suggestion:
>>
>> if ((ooffset | toffset) == 0 && len == (tv.length >> coder)) {
>> return coder == LATIN1
>> ? StringLatin1.equals(tv, ov)
>> : StringUTF16.equals(tv, ov);
>> }
>
>> Could we save a comparison by ORing offsets?
>
> I'm ok to use the or to save an additional offset comparison, good idea!
>
>> Should this also handle UTF-16?
>
> Currently String::equals doesn't use the UTF16 variant of the intrinsic, see
> https://github.com/openjdk/jdk/blob/949846986f572dfb82912e7d71e7bfd37a90871e/src/java.base/share/classes/java/lang/String.java#L1879-L1880;
> I think it's because checking byte per byte equality (from a semantic
> perspective), still hold. I didn't checked what are the other usage for the
> UTF16 version, but @cl4es could clarify it .
>
>> The equals method will do the tv and ov length comparisons themselves, could
>> we remove that check here or is the method call overhead too large?
>
> I didn't remove it because region equality won't return (always) false, as
> equals would do, in such case, but will compare the (If existing) minimal
> common available length (till the specified len) between the two.
>
>> Do you happen to have results for make test
>> TEST="micro:java.lang.StringComparisons"?
>
> Not yet, I have performed a microbenchmark myself to verify that with small
> strings (below 16 bytes), equals was much better (both by enabling and
> disabling the mismatch intrinsics), but I will do it, thanks!
>
> I'm missing as well to verify the bytecode size of the method after my
> changes, to make sure I didn't make it non-inlineable.
Yes, since the `coder` is checked first we know the bytes compared are either
both latin1 or both UTF16 so there's no need for a UTF-16-aware char-by-char
comparison here. In fact the `StringUTF16.equals` method looks completely
unused.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/16933#discussion_r1413697276