Thank you Peter for sharing your thoughts and experiments!
On 7/29/17 10:22 AM, Peter Levart wrote:
Hi Ivan,
On 07/28/2017 06:25 PM, Ivan Gerasimov wrote:
Hi Peter!
Thank a lot for looking into this!
On 7/28/17 7:32 AM, Peter Levart wrote:
Hi Ivan,
In the light of what Stuart Marks wrote, then what do you think
about a parameterized comparator (would be sub-optimal, I know)
where one would supply
2 Comparator(s) which would be used to compare Ax and Nx
sub-sequences respectively as described below...
Yes. Inspired by what Stuart suggested I made a draft of such a
comparator (see below).
It works, but as you've said it's not that efficient (mostly due to
expensive substrings) and a bit harder to use in a simple case.
Now I need to think about how to combine two approaches.
There is a bug in MyComparator. Comparing "1" with "2" gives 0
(equal). You must not return "equal" when both alpha prefixes are empty.
Yes, you're right. Should have checked the length of numerical
substrings, or just use a separate method for end-of-sequence.
For Nx sub-sequences, one would need a comparator comparing the
reversed sequence lexicographically,
I'm not sure I understand why they need to be reversed.
Scrap that. I got confused. :-o
but for Ax sub-sequences, one could choose from a plethora of
case-(in)sensitive comparator(s) and collators already available on
the platform.
Yes. In the example below I used compareToIgnoreCase to compare alpha
subsequences.
...
Arrays.sort(str,new
MyComparator(Comparator.comparing(String::toString,String::compareToIgnoreCase),Comparator.comparing(String::toString,String::compareTo)));
Substrings are by copy. There is another approach. Originally you
created the NumericComparator to compare CharSequence(s) - not
String(s) as you did with MyComparator. So why not keeping that?
Well, it was mostly for illustrative purposes. Of course the final code
should deal with CharSequences, not Strings for generality. However, I
assume that in most cases this comparator will need to work with
Strings, and it's important to remember that String.subSequence() is as
expensive as String.substring() [1] [1]
http://docs.oracle.com/javase/8/docs/api/java/lang/String.html#subSequence-int-int-
While talking about types, you don't need to have a generic parameter:
NumericComparator<T extends CharSequence> implements
CharSequence<T> simple: NumericComparator implements
Comparator<CharSequence> is suitable for comparing any subtype of
CharSequence as use-sites should accept Comparator<? super String> or
Comparator<? super CharSequence>.
But it wouldn't work in a case you need to pass the comparator to a
function of form fun(Comparator<String> cmp).
For example: Stream<T> sorted(Comparator<? super T> comparator);
You can use Comparator<CharSequence> to sort Stream<String> as well as
Stream<StringBuilder> or Stream<CharSequence>... I tried an approach
where sub-sequences are CharSequence objects by delegation and use
sub-comparators that don't convert CharSequence(s) to String(s).
Here's what this looks like:
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/AlphadecimalComparator.java
with supporting sub-comparator classes:
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/CharSequenceComparator.java
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/peter/DecimalComparator.java
I created a JMH benchmark comparing your 02/webrev
(ivan.NumericComparator):
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/NumericComparator.java
with a modified variant of your MyComparator that works correctly
(ivan.MyComparator):
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/src/main/java/ivan/MyComparator.java
and my (peter.AlphadecimalComparator). The later two are tested with
case-sensitive (CS) and case-insensitive (CIS) variants for comparing
alpha sub-sequences. Here are the results:
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_speed.txt
With -prof gc, it can be seen that JIT is able to optimize-away
allocation of sub-sequence CharSequence(s). When the hot code is
JIT-ed, at least with provided helper sub-comparators, no allocation
is taking place (not so with MyComparator which creates substrings):
http://cr.openjdk.java.net/~plevart/misc/AlphadecimalComparator/alphadecimal-comparator/bench_results_gc.txt
That's a very good news then!
The performance of AlphadecimalComparator is about 50% slower than
your specialized NumericComparator, but it allows custom
parametrization. Perhaps the decimal part of variations could be
hidden behind a boolean or an enum parameter (there is a
leading-zeroes-greater, leading-zeroes-less, but also a
leading-zeroes-equal variant that makes sense). I tried to do that by
making DecimalComparator an enum and restricting the decimalComparato
parameter to that type, but maybe specializing code around a boolean
flag might yield better performance and DecimalComparator as
Comparator<CharSequence> doesn't make much sense outside the
AlphadecimalComparator. So what do you think? Also, what do you think
about the name? Regards, Peter
I've tried to go one step further and created even more abstract
comparator: It uses a supplied predicate to decompose the input
sequences into odd/even subsequences (e.g. alpha/numeric) and then uses
two separate comparator to compare them. Additionally, a comparator for
comparing sequences, consisting only of digits is provided. For example,
to build a case-insensitive AlphaDecimal comparator one could use: 1)
Character::isDigit -- as the predicate for decomposing, 2)
String::compareToIgnoreCase -- to compare alpha (i.e. odd parts); to
work with CharSequences one would need to make it
Comparator.comparing(CharSequence::toString,
String::compareToIgnoreCase), 3) The special decimal-only comparator,
which compares the decimal representation of the sequences. Here's the
file with all the comparators and a simple test:
http://cr.openjdk.java.net/~igerasim/8134512/test/Test.java
--
With kind regards,
Ivan Gerasimov