Guys,
I am not very stuck with ABCD. I am not. Not at all. Not am I.
Ehm.
Lickily, I finished digging into the implementation and making sure it
is correct. Now I am pretty confident that classic_abcd does the right
thing! (no guarantees, you know, it's software) Had to refactor the
code a bit to fill in the gaps of my poor understanding. I think, we
should commit the changes...
featuring:
* two-state Inequality Graph, dot printing is just beautiful
* better readability
* unit-like-tests against the new functionality
* option: -XX:jit.arg.dump_abcd_stats=true to dump stats (total/eliminated)
* same amount of checks eliminated as before
* well-known tests breaking oldish ABCD _passed_, of course
in all, HARMONY-4476 (more details in JIRA)
Given that Windows is not what I am lucky with today, if a JIT guru
(Pavel, Mikhail, George?) had time to take a look at the patch and run
'build test' on Windows that would be really-really great!
And now the ugly porn:
1. I could not run almost all of DaCapo benches for various reasons, so
tested only on hsqldb, wow, anybody aware of it or is it just me
ugly little creature? had little time, sorry
2. ~10% bounds checks removed in hot methods and 8% in total, with
ABCD innocent and many other optimizations very very guilty or
absent. OMG!
3. "memopt" is probably the ugliest!! In my example:
1. array A lies in a non-volatile field
2. A.length is computed right in the method entrance
3. A is then loaded via "ldfield" bytecode instruction in the
middle of the method
4. nothing writes to the field, just accesses elements of A
And who could imagine that A.length would be computed twice with
"memopt" having nothing to do with second appearance A.length?
Thus, not eliminating the second appearance of this sequence:
I9:ldflda [t1.BidirectionalBubbleSort2::a] -) t4:ref:int32[]
I10:ldind.unc:[] [t4] ((t2,t3)) -) t5:int32[]
I11:chknull t5 -) t6:tau
I12:tauhastype t5,int32[] -) t7:tau
I13:arraylen t5 ((t6,t7)) -) t8:int32
this is what should be optimized out definitely! and the thing that
breaks two A.length-s apart killing the idea of ABCD.
4. "loop versioning" is not implemented, and this is what I would like
to take. I already wrote some 2-component-nullstone-like
performance tests to detect how good a JVM deals with loop
versioning. Will post the bunch of them soon.
JIT gurus,
given the ugly (1) - (4), rather critical for performance do you like
the idea to fight them in a high priority? Could you share your
vision, please?
--
Egor Pasko