On 1/10/08, Aleksey Shipilev <[EMAIL PROTECTED]> wrote: > > And update here. I have confirmed that the main contributor is > ValueProfiler. > > RI measurement (again): > === /localdisk/jdk1.6.0_02/bin/java -server GenericQuicksort2 === > iteration 0: elapsed: 4825ms > iteration 1: elapsed: 4805ms > iteration 2: elapsed: 5128ms > iteration 3: elapsed: 5125ms > iteration 4: elapsed: 5130ms > > Baseline measurement (again): > === /nfs/pb/home/ashipile/jre-r610377-clean/bin/java -Xem:server > GenericQuicksort2 === > iteration 0: elapsed: 178898ms > iteration 1: elapsed: 5663ms > iteration 2: elapsed: 5666ms > iteration 3: elapsed: 5660ms > iteration 4: elapsed: 5672ms > > Collapsing critical section in ValueProfiler::addNewValue to wrap only > insert_into_tnv_table - that should be initial proof-of-concept for > going to CAS increase, Note that first iteration time decreased > significantly, so we might consider CAS as an option: > === /nfs/pb/home/ashipile/jre-r610377-work/bin/java -Xem:server > GenericQuicksort2 === > iteration 0: elapsed: 85127ms > iteration 1: elapsed: 5665ms > iteration 2: elapsed: 5665ms > iteration 3: elapsed: 5667ms > iteration 4: elapsed: 5679ms > > > Removing synchronization from VP at all (replacing > lockProfile/unlockProfile with empty stubs rather that hymutex_*), > note more decrease in rampup time and *boost* on next stages > (probably, no more locking for concurrent SD1_OPT methods profiling?): > === /nfs/pb/home/ashipile/jre-r610377-work/bin/java -Xem:server > GenericQuicksort2 === > iteration 0: elapsed: 79678ms > iteration 1: elapsed: 5018ms > iteration 2: elapsed: 5014ms > iteration 3: elapsed: 5013ms > iteration 4: elapsed: 5028ms > > The profile of this mode, FIRST iteration, after 30 seconds of run: > 27% Other32 > 21% libem#addNewValue > 10% libharmonyvm#helper_get_interface_vtable > 17% libem#find > 8% libem#value_profiler_add_value > 3% libem#getVPC > 5% libharmonyvm#rth_get_interface_vtable > 6% libjitrino#add_value_profile_value > > The profile of this mode, LAST iteration: > 99% Other32 > 1% libjitrino#<various> > > Note that locks are disappeared - that testifies the problem with VP > locks. After rampup there seem to be just a little JRE activity, most > of the time executing user code. > > I'm going to propose the option that eliminates synchronization from > VP completely sacrificing profile accuracy. Egor, Pavel, what do you > think? Is synchronization removal too dangerous?
Synchronization removal won't likely break execution - there are no allocations or object moving in addNewValue method of value profile. But this may lead to intermittent slow down of the code. There may be synchronization conflict when one thread adds value to a slot and another thread assigns a frequency of another value to the same slot. Need to evaluate this approach on other workloads. Egor's suggestion to not update a value profile if a flag is up might work better. I'm also thinking of packing 2 fields of Simple_TNV_Table structure to a single value which would be written atomically. This will solve profile mangling in lock-less solution. So, subsequent steps might be following: - check how lock-less solution work with bigger workloads (SPECs, DaCapo etc) - Check Egor's suggestion with a flag - if needed, prototype atomic profile update Just a thought: next thing we should consider is making VP to stop > profiling after optimized version of code is available, since we don't > care about profile information further. I think this worth implementing. Thanks, > Aleksey. >
