Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Stuart Marks Wed, 09 Aug 2017 17:01:04 -0700

On 8/1/17 11:56 PM, Ivan Gerasimov wrote:

I've tried to go one step further and created even more abstract comparator:  It
uses a supplied predicate to decompose the input sequences into odd/even
subsequences (e.g. alpha/numeric) and then uses two separate comparator to
compare them. Additionally, a comparator for comparing sequences, consisting
only of digits is provided. For example, to build a case-insensitive
AlphaDecimal comparator one could use: 1) Character::isDigit -- as the predicate
for decomposing, 2) String::compareToIgnoreCase -- to compare alpha (i.e. odd
parts); to work with CharSequences one would need to make it
Comparator.comparing(CharSequence::toString, String::compareToIgnoreCase), 3)
The special decimal-only comparator, which compares the decimal representation
of the sequences. Here's the file with all the comparators and a simple test:
http://cr.openjdk.java.net/~igerasim/8134512/test/Test.java


Hi, a couple follow-up thoughts on this.

1) Supplementary characters

The current code uses Character.isDigit(char), which works only for char valuesin the BMP (basic multilingual plane, values <= U+FFFF). It won't work forsupplementary characters. There are several blocks of digits in the BMP, butthere are several more in the supplementary character range.

I don't see any reason not to handle the supplementary characters as well,except that it spoils the nice char-by-char technique of processing the string.Instead, it'd have to pull in code point values, which might be comprised of twosurrogate chars. There are a variety of methods on Character that help withthis. Note that there is an overload Character.isDigit(int) which takes any codepoint value, including supplementary characters.


2) Too much generality?

This version includes Predicate<Character> for determining whether a characteris part of the alphabetic or decimal portion of the string. I'm thinking thismight be overkill. It might be sufficient to "hardwire" the partitioningpredicate to be Character::isDigit and the value mapping function to useCharacter::digit.

The problem is that adding a predicate opens the door to a lot more complexity,while providing dimishing value. First, the predicate would have to handle codepoints (per the above) so it'd need to be an IntPredicate. Second, there wouldalso need to be a mapping function from the code point value to a numeric value.This might be an IntUnaryOperator. This would allow someone to sort based onRoman numerals, using Character::getNumericValue. (Yes, Roman numerals are inUnicode.) Or maybe the mapping function should return any Comparable value, notan int. ... See where I'm going here?

Since this kind of sorting is intended to be viewed by people, it's probablyworth providing full internationalization support (supplementary characters, anddelegation to sub-comparators, to allow locale-specific collating sequences).But I start to question any complexity beyond that.


s'marks

Re: [10] RFR 8134512 : provide Alpha-Numeric (logical) Comparator

Reply via email to