Hi Ivan,
On 07/28/2017 06:25 PM, Ivan Gerasimov wrote:
Hi Peter!
Thank a lot for looking into this!
On 7/28/17 7:32 AM, Peter Levart wrote:
Hi Ivan,
In the light of what Stuart Marks wrote, then what do you think about
a parameterized comparator (would be sub-optimal, I know) where one
would supply
2 Comparator(s) which would be used to compare Ax and Nx
sub-sequences respectively as described below...
Yes. Inspired by what Stuart suggested I made a draft of such a
comparator (see below).
It works, but as you've said it's not that efficient (mostly due to
expensive substrings) and a bit harder to use in a simple case.
Now I need to think about how to combine two approaches.
There is a bug in MyComparator. Comparing "1" with "2" gives 0 (equal).
You must not return "equal" when both alpha prefixes are empty.
For Nx sub-sequences, one would need a comparator comparing the
reversed sequence lexicographically,
I'm not sure I understand why they need to be reversed.
Scrap that. I got confused. :-o
but for Ax sub-sequences, one could choose from a plethora of
case-(in)sensitive comparator(s) and collators already available on
the platform.
Yes. In the example below I used compareToIgnoreCase to compare alpha
subsequences.
...
Arrays.sort(str,new
MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));
Substrings are by copy. There is another approach. Originally you
created the NumericComparator to compare CharSequence(s) - not String(s)
as you did with MyComparator. So why not keeping that?
While talking about types, you don't need to have a generic parameter:
NumericComparator<T extends CharSequence> implements CharSequence<T>
simple:
NumericComparator implements Comparator<CharSequence>
is suitable for comparing any subtype of CharSequence as use-sites
should accept Comparator<? super String> or Comparator<? super
CharSequence>. For example:
Stream<T> sorted(Comparator<? super T> comparator);
You can use Comparator<CharSequence> to sort Stream<String> as well as
Stream<StringBuilder> or Stream<CharSequence>...
I tried an approach where sub-sequences are CharSequence objects by
delegation and use sub-comparators that don't convert CharSequence(s) to
String(s). Here's what this looks like:
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/AlphadecimalComparator.java
with supporting sub-comparator classes:
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/CharSequenceComparator.java
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/DecimalComparator.java
I created a JMH benchmark comparing your 02/webrev
(ivan.NumericComparator):
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/NumericComparator.java
with a modified variant of your MyComparator that works correctly
(ivan.MyComparator):
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/MyComparator.java
and my (peter.AlphadecimalComparator). The later two are tested with
case-sensitive (CS) and case-insensitive (CIS) variants for comparing
alpha sub-sequences.
Here are the results:
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_speed.txt
With -prof gc, it can be seen that JIT is able to optimize-away
allocation of sub-sequence CharSequence(s). When the hot code is JIT-ed,
at least with provided helper sub-comparators, no allocation is taking
place (not so with MyComparator which creates substrings):
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_gc.txt
The performance of AlphadecimalComparator is about 50% slower than your
specialized NumericComparator, but it allows custom parametrization.
Perhaps the decimal part of variations could be hidden behind a boolean
or an enum parameter (there is a leading-zeroes-greater,
leading-zeroes-less, but also a leading-zeroes-equal variant that makes
sense). I tried to do that by making DecimalComparator an enum and
restricting the decimalComparato parameter to that type, but maybe
specializing code around a boolean flag might yield better performance
and DecimalComparator as Comparator<CharSequence> doesn't make much
sense outside the AlphadecimalComparator.
So what do you think? Also, what do you think about the name?
Regards, Peter