Hi all,
I've gathered statistics for Dapaco.jython bench (the worst Dacapo
bench in performance point of view), and identified several places for
optimization. For every hot place small testcase was created – you can
find below as well as estimated speedup for every case. I believe that
optimization below could significantly improve current "horrible"
situation for jython (7570 on DRLVM vs 2916 on Sun 1.6).
Throwing/catching exception (HARMONY-4549 was created to track the issue)
Expected boost: 700 ms = ~5-7 % overall jython bench
Description: Raising/catching exceptions is very slow in comparison
with Sun. TryRaiseExcept sub-bench of jython bench throwing and
catching thousands exceptions and as you can see from the numbers
below, it works more that 3 times slower on drlvm. AFAIU, since there
are some operations on exception object in catch block VM unwind the
stack every time exception caught.
Small testcase:
public class TestExceptions {
public static void main(String[] args) {
//warmup VM first
tryRaiseExceptions(1);
long start = System.currentTimeMillis();
tryRaiseExceptions(1000000);
long res = System.currentTimeMillis() -start;
System.out.println("completed in "+res+" msec");
}
public static void tryRaiseExceptions(int n) {
for(int i=0; i<n; i++)
try{
throw new TException();
}catch(TException throwable){
TException ts = Test2.test(throwable);
}
}
}
public class Test2 {
public static TException test(TException thr) {
return thr;
}
}
public class TException extends RuntimeException {
}
System.identityHashCode re-implementation on magics (HARMONY-4551)
Expected boost: 1000 ms = ~10% overall
Description: System.identityHashCode() method frequently used in
jython bench (more that 22000000 invocations). The reason of some many
invocations is IdentityHashMap usage for storing ThreadLocal objects.
I assume the method could be implemented through magic's and small
experiments with the next incorrect implementation shows huge speedup
on small testcase (from 1609 msec for un-patched version to 409 msec
on patched one)
return ObjectReference.fromObject(object).toAddress().toInt();
Small testcase:
public class test {
public static void main(String[] args) {
runTest(1000, new Object());
long start = System.currentTimeMillis();
runTest(10000000, str);
long end = System.currentTimeMillis() - start;
System.out.println("completed in "+end);
}
public static void runTest(int num, Object obj) {
for(int i=0; i<num; i++) {
System.identityHashCode(new Object());
}
}
}
Instanceof modification (HARMONY-4552)
Expected boost: 700 ms = ~5-7%
Description: instanceof used in many places in Dacapo, but the hottest
places are Arithmetic operations, in particular CompareFloats,
CompareIntegers, SimpleFloatArithmetic, etc. The typical code for
those benches is the following:
PyInteger add(PyObject obj)
If(obj instanceof PyInteger)
Int v = ((PyInteger)obj).value
It means that we have thousands of instanceof check for the same
object, i.e. PyInteger instanceof PyInteger. Small testcase illustrate
the problem. I should mention that the test works very fast on Sun 1.6
server : 15 msec, while in client mode it completed in 2600 msec. On
Harmony VM in server mode test completed in 2700 msec
Small testcase:
public class Test {
public static void main(String[] args) {
runTest(1000, new String());
long start = System.currentTimeMillis();
runTest(1000000000, new String());
long end = System.currentTimeMillis() - start;
System.out.println("completed in "+end);
}
public static void runTest(int num, String obj) {
for(int i=0; i<num; i++) {
if(obj instanceof String){}
}
}
}
String.compareTo and equals methods optimizations ( HARMONY-4553 )
Expected boost: 700 ms = ~5-7%
Description: compareTo and equals methods used in CompareStrings,
CompareInternedStrings sub benches and in several cases inside jython.
The test below shows that DRLVM significantly slower on these
operation.
Small testcase:
public class CompareToTest{
public static void main(String[] args){
String st1 = new String("0 1 2 3 4 5 6 7 8 9");
String st2 = new String("0 1 2 3 4 5 6 7 8 9");
//warmup VM
stringCompareTo(st1, st2, 100000);
long start = System.currentTimeMillis();
stringCompareTo(st1, st2, 20000000);
long end = System.currentTimeMillis() -start;
System.out.println("String compareTo for equals strings
completed in "+end +" msec");
st1 = new String("0 1 2 3 4 5 6 7 8 9abc");
//warmup VM
stringCompareTo(st1, st2, 100000);
long start1 = System.currentTimeMillis();
stringCompareTo(st1, st2, 20000000);
long end1 = System.currentTimeMillis() -start1;
System.out.println("String compareTo for non equals strings
completed in "+end1 +" msec");
System.out.println("Total in "+(end1+end) +" msec");
}
public static void stringCompareTo(String st1, String st2, int num){
for(int x=0; x<num; x++) {
st1.compareTo(st2);
}
}
}
Thread.currentThread() method optimization (HARMONY-4555)
Expected boost: ~5%
Description: Thread.currentThread() is also one of the hot method for
jython bench. The method invoked more that 7.5 millions times during
jython execution. Despite the fact that the method was already
optimized several times it still works slower on comparison with RI.
I've made some experiments with magics implementation several weeks
ago and have a good speedup for small test and for jython bench. Since
threading system redesigning at the moment, I think it would be great
to add currentThread() optimization to the plan.
Testcase:
public class CurrentThreadTest {
public static void main(String[] args) {
long st = System.currentTimeMillis();
for(int i=0; i< 100000000; i++) {
Thread.currentThread();
}
long res = System.currentTimeMillis()-st;
System.out.println("res="+res);
}
}
Could JIT, GC and Thread gurus please have a look to the mentioned issues?
Thanks.
Vladimir.
Sub-benches statistics in milliseconds :
HARMONY JDK H vs JDK
BuiltinFunctionCalls 63 78 0,807692
BuiltinMethodLookup 265 203 1,305419
CompareFloats 110 31 3,548387
CompareFloatsIntegers 94 63 1,492063
CompareIntegers 156 31 5,032258
CompareInternedStrings 187 31 6,032258
CompareLongs 94 32 2,9375
CompareStrings 125 31 4,032258
CompareUnicode 94 31 3,032258
ConcatStrings 797 656 1,214939
ConcatUnicode 562 188 2,989362
CreateInstances 203 62 3,274194
CreateNewInstances 344 204 1,686275
CreateStringsWithConcat 344 156 2,205128
CreateUnicodeWithConcat 141 78 1,807692
DictCreation 156 78 2
DictWithFloatKeys 328 141 2,326241
DictWithIntegerKeys 157 78 2,012821
DictWithStringKeys 62 62 1
ForLoops 78 94 0,829787
IfThenElse 172 234 0,735043
ListSlicing 63 32 1,96875
NestedForLoops 109 109 1
NormalClassAttribute 156 78 2
NormalInstanceAttribute 125 63 1,984127
PythonFunctionCalls 188 78 2,410256
PythonMethodCalls 250 109 2,293578
Recursion 250 94 2,659574
SecondImport 141 109 1,293578
SecondPackageImport 156 141 1,106383
SecondSubmoduleImport 234 187 1,251337
SimpleComplexArithmetic 110 16 6,875
SimpleDictManipulation 156 94 1,659574
SimpleFloatArithmetic 109 62 1,758065
SimpleIntFloatArithmetic 78 16 4,875
SimpleIntegerArithmetic 63 31 2,032258
SimpleListManipulation 62 31 2
SimpleLongArithmetic 157 188 0,835106
SmallLists 343 141 2,432624
SmallTuples 250 125 2
SpecialClassAttribute 141 93 1,516129
SpecialInstanceAttribute 125 63 1,984127
StringMappings 328 125 2,624
StringPredicates 219 109 2,009174
StringSlicing 140 78 1,794872
TryExcept 16 0
TryRaiseExcept 1641 500 3,282
TupleSlicing 172 94 1,829787
UnicodeMappings 156 110 1,418182
UnicodePredicates 219 78 2,807692
UnicodeSlicing 140 62 2,258065
10829 5578