Thank you Peter for sharing your thoughts and experiments!


On 7/29/17 10:22 AM, Peter Levart wrote:
Hi Ivan,

On 07/28/2017 06:25 PM, Ivan Gerasimov wrote:

Hi Peter!

Thank a lot for looking into this!

On 7/28/17 7:32 AM, Peter Levart wrote:
Hi Ivan,

In the light of what Stuart Marks wrote, then what do you think about a parameterized comparator (would be sub-optimal, I know) where one would supply 2 Comparator(s) which would be used to compare Ax and Nx sub-sequences respectively as described below...

Yes. Inspired by what Stuart suggested I made a draft of such a comparator (see below). It works, but as you've said it's not that efficient (mostly due to expensive substrings) and a bit harder to use in a simple case.
Now I need to think about how to combine two approaches.

There is a bug in MyComparator. Comparing "1" with "2" gives 0 (equal). You must not return "equal" when both alpha prefixes are empty.

Yes, you're right. Should have checked the length of numerical substrings, or just use a separate method for end-of-sequence.


For Nx sub-sequences, one would need a comparator comparing the reversed sequence lexicographically,
I'm not sure I understand why they need to be reversed.

Scrap that. I got confused.  :-o

but for Ax sub-sequences, one could choose from a plethora of case-(in)sensitive comparator(s) and collators already available on the platform.

Yes. In the example below I used compareToIgnoreCase to compare alpha subsequences.
...
Arrays.sort(str,new MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));
Substrings are by copy. There is another approach. Originally you created the NumericComparator to compare CharSequence(s) - not String(s) as you did with MyComparator. So why not keeping that?
Well, it was mostly for illustrative purposes. Of course the final code should deal with CharSequences, not Strings for generality. However, I assume that in most cases this comparator will need to work with Strings, and it's important to remember that String.subSequence() is as expensive as String.substring() [1] [1] http://docs.oracle.com/javase/8/docs/api/java/lang/String.html#subSequence-int-int-
While talking about types, you don't need to have a generic parameter: NumericComparator<T extends CharSequence> implements CharSequence<T> simple: NumericComparator implements Comparator<CharSequence> is suitable for comparing any subtype of CharSequence as use-sites should accept Comparator<? super String> or Comparator<? super CharSequence>.
But it wouldn't work in a case you need to pass the comparator to a function of form fun(Comparator<String> cmp).
For example: Stream<T> sorted(Comparator<? super T> comparator); You can use Comparator<CharSequence> to sort Stream<String> as well as Stream<StringBuilder> or Stream<CharSequence>... I tried an approach where sub-sequences are CharSequence objects by delegation and use sub-comparators that don't convert CharSequence(s) to String(s). Here's what this looks like: http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/AlphadecimalComparator.java with supporting sub-comparator classes: http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/CharSequenceComparator.java http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/DecimalComparator.java I created a JMH benchmark comparing your 02/webrev (ivan.NumericComparator): http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/NumericComparator.java with a modified variant of your MyComparator that works correctly (ivan.MyComparator): http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/MyComparator.java and my (peter.AlphadecimalComparator). The later two are tested with case-sensitive (CS) and case-insensitive (CIS) variants for comparing alpha sub-sequences. Here are the results: http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_speed.txt With -prof gc, it can be seen that JIT is able to optimize-away allocation of sub-sequence CharSequence(s). When the hot code is JIT-ed, at least with provided helper sub-comparators, no allocation is taking place (not so with MyComparator which creates substrings): http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_gc.txt

That's a very good news then!
The performance of AlphadecimalComparator is about 50% slower than your specialized NumericComparator, but it allows custom parametrization. Perhaps the decimal part of variations could be hidden behind a boolean or an enum parameter (there is a leading-zeroes-greater, leading-zeroes-less, but also a leading-zeroes-equal variant that makes sense). I tried to do that by making DecimalComparator an enum and restricting the decimalComparato parameter to that type, but maybe specializing code around a boolean flag might yield better performance and DecimalComparator as Comparator<CharSequence> doesn't make much sense outside the AlphadecimalComparator. So what do you think? Also, what do you think about the name? Regards, Peter
I've tried to go one step further and created even more abstract comparator: It uses a supplied predicate to decompose the input sequences into odd/even subsequences (e.g. alpha/numeric) and then uses two separate comparator to compare them. Additionally, a comparator for comparing sequences, consisting only of digits is provided. For example, to build a case-insensitive AlphaDecimal comparator one could use: 1) Character::isDigit -- as the predicate for decomposing, 2) String::compareToIgnoreCase -- to compare alpha (i.e. odd parts); to work with CharSequences one would need to make it Comparator.comparing(CharSequence::toString, String::compareToIgnoreCase), 3) The special decimal-only comparator, which compares the decimal representation of the sequences. Here's the file with all the comparators and a simple test: http://cr.openjdk.java.net/~igerasim/8134512/test/Test.java

--
With kind regards,
Ivan Gerasimov

Reply via email to