On 7/20/2020 8:58 PM, naoto.s...@oracle.com wrote:
The short-cut worked well. There's maybe a further optimization we could do to rid us of the performance concern (or the need to run additional performance tests). Consider the cases where strings in comparison don't contain any sup characters, if we make the toLower/UpperCase() block a method and call it before and after the surrogate-check block, the routine would be effectively the same as prior to the introduction of the surrogate-check block, and regular comparisons would suffer the surrogate-check only once (the last check). For strings that do contain sup characters then, the toLower/UpperCase() method would have been called twice, but then we don't care about the performance in that situation. You may call the existing codePointAt method too when an extra getChar and performance is not a concern (but that's your call.

Can you please elaborate this more? What's "the last check" here?

What I meant was that we could expand the 'short-cut' from case sensitive to case insensitive, that is in addition to the line 337, do that line 353 - 370 case-insensitive check as well.

I guess it can be explained better with code. I added inline comment:

        for (int k1 = toffset, k2 = ooffset; k1 < tlast && k2 < olast; k1++, k2++) {
            int cp1 = (int)getChar(value, k1);
            int cp2 = (int)getChar(other, k2);

// does a case-insensitive check:

            if (checkEqual(cp1, cp2) == 0) {
                continue;
            }

// this block will be run once for strings that don't contain any supplementary characters

             // Check for supplementary characters case
            cp1 = getSupplementaryCodePoint(value, cp1, k1, toffset, tlast);
            if ((cp1 & Integer.MIN_VALUE) != 0) {
                k1++;
                cp1 ^= Integer.MIN_VALUE;
            }
            cp2 = getSupplementaryCodePoint(other, cp2, k2, ooffset, olast);
            if ((cp2 & Integer.MIN_VALUE) != 0) {
                k2++;
                cp2 ^= Integer.MIN_VALUE;
            }


// thischeck will have been called twice for strings that contain supplementary characters
// but only one more for strings that don't

            int diff = checkEqual(cp1, cp2);
            if (diff != 0) {
                return diff;
            }
        }
        return tlen - olen;
    }

// the code block between line 353 - 370 in webrev.04 except the last line (return 0):
    private static int checkEqual(int cp1, int cp2) {
        if (cp1 != cp2) {
            // try converting both characters to uppercase.
            // If the results match, then the comparison scan should
            // continue.
            cp1 = Character.toUpperCase(cp1);
            cp2 = Character.toUpperCase(cp2);
            if (cp1 != cp2) {
                // Unfortunately, conversion to uppercase does not work properly                 // for the Georgian alphabet, which has strange rules about case
                // conversion.  So we need to make one last check before
                // exiting.
                cp1 = Character.toLowerCase(cp1);
                cp2 = Character.toLowerCase(cp2);
                if (cp1 != cp2) {
                    return cp1 - cp2;
                }
            }
        }
        return 0;
    }




Naoto

Reply via email to