And update here. I have confirmed that the main contributor is ValueProfiler.
RI measurement (again): === /localdisk/jdk1.6.0_02/bin/java -server GenericQuicksort2 === iteration 0: elapsed: 4825ms iteration 1: elapsed: 4805ms iteration 2: elapsed: 5128ms iteration 3: elapsed: 5125ms iteration 4: elapsed: 5130ms Baseline measurement (again): === /nfs/pb/home/ashipile/jre-r610377-clean/bin/java -Xem:server GenericQuicksort2 === iteration 0: elapsed: 178898ms iteration 1: elapsed: 5663ms iteration 2: elapsed: 5666ms iteration 3: elapsed: 5660ms iteration 4: elapsed: 5672ms Collapsing critical section in ValueProfiler::addNewValue to wrap only insert_into_tnv_table - that should be initial proof-of-concept for going to CAS increase, Note that first iteration time decreased significantly, so we might consider CAS as an option: === /nfs/pb/home/ashipile/jre-r610377-work/bin/java -Xem:server GenericQuicksort2 === iteration 0: elapsed: 85127ms iteration 1: elapsed: 5665ms iteration 2: elapsed: 5665ms iteration 3: elapsed: 5667ms iteration 4: elapsed: 5679ms Removing synchronization from VP at all (replacing lockProfile/unlockProfile with empty stubs rather that hymutex_*), note more decrease in rampup time and *boost* on next stages (probably, no more locking for concurrent SD1_OPT methods profiling?): === /nfs/pb/home/ashipile/jre-r610377-work/bin/java -Xem:server GenericQuicksort2 === iteration 0: elapsed: 79678ms iteration 1: elapsed: 5018ms iteration 2: elapsed: 5014ms iteration 3: elapsed: 5013ms iteration 4: elapsed: 5028ms The profile of this mode, FIRST iteration, after 30 seconds of run: 27% Other32 21% libem#addNewValue 10% libharmonyvm#helper_get_interface_vtable 17% libem#find 8% libem#value_profiler_add_value 3% libem#getVPC 5% libharmonyvm#rth_get_interface_vtable 6% libjitrino#add_value_profile_value The profile of this mode, LAST iteration: 99% Other32 1% libjitrino#<various> Note that locks are disappeared - that testifies the problem with VP locks. After rampup there seem to be just a little JRE activity, most of the time executing user code. I'm going to propose the option that eliminates synchronization from VP completely sacrificing profile accuracy. Egor, Pavel, what do you think? Is synchronization removal too dangerous? Just a thought: next thing we should consider is making VP to stop profiling after optimized version of code is available, since we don't care about profile information further. Thanks, Aleksey.
