Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Ivan Gerasimov Mon, 25 Sep 2017 10:50:29 -0700

Hello!

Could you please review at your convenience?

In the latest webrev I took all suggestions into account (unless Imissed something.)


http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/

I think, if the suggested comparator is found useful by the users, thenit may make sense to create the String-oriented variant, which can beimplemented through the CharSequence-oriented one as:


class String {
    ...
    @SuppressWarnings("unchecked")
    public static <T extends String> Comparator<T>
    comparingAlphaDecimal(Comparator<? super String> alphaComparator) {
        return (Comparator<T>) (Comparator)

newComparators.AlphaDecimalComparator<>(Objects.requireNonNull((Comparator<CharSequence>) alphaComparator),false);

}
}

This will be safe, since the specification guarantees thatString.subSequence() returns a String.

Then in the application code it would be possible to instantiate thecomparators as


        String.comparingAlphaDecimal(String::compareTo);

        String.comparingAlphaDecimal(String::compareToIgnoreCase);

or, alternatively,
        String.comparingAlphaDecimal(Comparator.naturalOrder());

String.comparingAlphaDecimal(String.CASE_INSENSITIVE_ORDER);

But this could be deferred for later, of course.

With kind regards,
Ivan


On 8/27/17 1:38 PM, Ivan Gerasimov wrote:

Hello everyone!

Here's another iteration of the comparator with suggested improvements.
Now, there is the only input argument -- the alpha-comparator forcomparing the non-decimal-digit sub-sequences.
For the javadoc I used the text suggested by Peter with somemodifications, additional example and API/implementation notes.Overall, the javadoc looks heavier than need to me, so I'd love tohear comments about how to make it shorter and cleaner.
Also, I adopted the name AlphaDecimal, suggested by Peter. This nameis one of popular in the list of variants found in the wild. So, thereare higher chances the users can find the routine by its name.
For testing if a code point is a decimal digit, I used(Character.getType(cp) == Character.DECIMAL_DIGIT_NUMBER), which seemto be more appropriate than Character.isDigit(). (The later is truefor things like a digit in a circle, superscript, etc., which do notseem to be a part of a decimal number composed of several digits.)
The updated webrev:
http://cr.openjdk.java.net/~igerasim/8134512/04/webrev/

Please review at your convenience.

With kind regards,
Ivan

On 8/9/17 4:59 PM, Stuart Marks wrote:
On 8/1/17 11:56 PM, Ivan Gerasimov wrote:
I've tried to go one step further and created even more abstractcomparator: Ituses a supplied predicate to decompose the input sequences intoodd/evensubsequences (e.g. alpha/numeric) and then uses two separatecomparator tocompare them. Additionally, a comparator for comparing sequences,consisting
only of digits is provided. For example, to build a case-insensitive
AlphaDecimal comparator one could use: 1) Character::isDigit -- asthe predicatefor decomposing, 2) String::compareToIgnoreCase -- to compare alpha(i.e. odd
parts); to work with CharSequences one would need to make it
Comparator.comparing(CharSequence::toString,String::compareToIgnoreCase), 3)The special decimal-only comparator, which compares the decimalrepresentationof the sequences. Here's the file with all the comparators and asimple test:
http://cr.openjdk.java.net/~igerasim/8134512/test/Test.java
Hi, a couple follow-up thoughts on this.

1) Supplementary characters
The current code uses Character.isDigit(char), which works only forchar values in the BMP (basic multilingual plane, values <= U+FFFF).It won't work for supplementary characters. There are several blocksof digits in the BMP, but there are several more in the supplementarycharacter range.
I don't see any reason not to handle the supplementary characters aswell, except that it spoils the nice char-by-char technique ofprocessing the string. Instead, it'd have to pull in code pointvalues, which might be comprised of two surrogate chars. There are avariety of methods on Character that help with this. Note that thereis an overload Character.isDigit(int) which takes any code pointvalue, including supplementary characters.
2) Too much generality?
This version includes Predicate<Character> for determining whether acharacter is part of the alphabetic or decimal portion of the string.I'm thinking this might be overkill. It might be sufficient to"hardwire" the partitioning predicate to be Character::isDigit andthe value mapping function to use Character::digit.
The problem is that adding a predicate opens the door to a lot morecomplexity, while providing dimishing value. First, the predicatewould have to handle code points (per the above) so it'd need to bean IntPredicate. Second, there would also need to be a mappingfunction from the code point value to a numeric value. This might bean IntUnaryOperator. This would allow someone to sort based on Romannumerals, using Character::getNumericValue. (Yes, Roman numerals arein Unicode.) Or maybe the mapping function should return anyComparable value, not an int. ... See where I'm going here?
Since this kind of sorting is intended to be viewed by people, it'sprobably worth providing full internationalization support(supplementary characters, and delegation to sub-comparators, toallow locale-specific collating sequences). But I start to questionany complexity beyond that.
s'marks


--
With kind regards,
Ivan Gerasimov

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Reply via email to