Christian, Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is zero only in invoke dynamic case?
On 8/8/11 6:56 AM, Christian Thalinger wrote: >> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use >> movl() (only 32 bit)? > > Good question. I took the code from TemplateTable::resolve_cache_and_index > without thinking about it and that one uses ld_ptr. > > _indices in CosntantPoolCacheEntry is defined as intx: > > volatile intx _indices; // constant pool index& rewrite bytecodes > > and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word: > > // bit number |31 0| > // bit length |-8--|-8--|---16----| > // -------------------------------- > // _indices [ b2 | b1 | index ] > > Loading 32-bit on LE gives you the right bits but on BE it does not. I think > that's the reason for the "optimization" on x64. I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file). >> >> I am concern about using next short branch in new code in >> templateTable_sparc.cpp: >> >> cmp_and_br_short(..., L_patch_done); // don't patch >> >> There is __ stop() call which generates a lot of code so that label >> L_patch_done could be far. > > Yeah, I thought I give it a try if it works. cmp_and_br_short should assert > if the branch displacement is too far, right? > Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM on Sparc + compressed oops + VerifyOops. >> >> >> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs >> it? Nothing else is changed in these files. > > Both files use dependencies and I got linkage errors on Linux while working > on the fix (because of inline methods). It seems that the include is not > required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. > But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies. > OK. > >> >> Why you did not leave "volatile" call site inlining with guard? You did not >> explain why virtual call is fine for it. > > The spec of MutableCallSite says: > > "For target values which will be frequently updated, consider using a > volatile call site instead." > > And VolatileCallSite says: > > "A VolatileCallSite is a CallSite whose target acts like a volatile variable. > An invokedynamic instruction linked to a VolatileCallSite sees updates to its > call site target immediately, even if the update occurs in another thread. > There may be a performance penalty for such tight coupling between threads. > > Unlike MutableCallSite, there is no syncAll operation on volatile call sites, > since every write to a volatile variable is implicitly synchronized with > reader threads. > > In other respects, a VolatileCallSite is interchangeable with > MutableCallSite." > > Since VolatileCallSite really should only be used when you know the target > changes very often we don't do optimizations for this case. Obviously this > is just a guess how people will use VolatileCallSite but I think for now this > is a safe bet. > Thank you for explaining it. > Additionally I had to do two small changes because the build was broken on > some configurations: > > - klassOop new_type = _changes.is_klass_change() ? > _changes.as_klass_change()->new_type() : NULL; > + klassOop new_type = _changes.is_klass_change() ? > _changes.as_klass_change()->new_type() : (klassOop) NULL; > > and > > - MutexLockerEx ccl(CodeCache_lock, thread); > + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag); > > I updated the webrev. Good. Vladimir > > -- Christian > >> >> >> Vladimir >> >> On 8/5/11 6:32 AM, Christian Thalinger wrote: >>> http://cr.openjdk.java.net/~twisti/7071653 >>> >>> 7071653: JSR 292: call site change notification should be pushed not pulled >>> Reviewed-by: >>> >>> Currently every speculatively inlined method handle call site has a >>> guard that compares the current target of the CallSite object to the >>> inlined one. This per-invocation overhead can be removed if the >>> notification is changed from pulled to pushed (i.e. deoptimization). >>> >>> I had to change the logic in TemplateTable::patch_bytecode to skip >>> bytecode quickening for putfield instructions when the put_code >>> written to the constant pool cache is zero. This is required so that >>> every execution of a putfield to CallSite.target calls out to >>> InterpreterRuntime::resolve_get_put to do the deoptimization of >>> depending compiled methods. >>> >>> I also had to change the dependency machinery to understand other >>> dependencies than class hierarchy ones. DepChange got the super-type >>> of two new dependencies, KlassDepChange and CallSiteDepChange. >>> >>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK >>> tests and vm.mlvm tests. >>> >>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147, >>> second with 7071653). Since the CallSite targets don't change during >>> the runtime of this benchmark we can see the performance benefit of >>> eliminating the guard: >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.883000 0.000000 0.883000 ( 0.854000) >>> 0.715000 0.000000 0.715000 ( 0.715000) >>> 0.712000 0.000000 0.712000 ( 0.712000) >>> 0.713000 0.000000 0.713000 ( 0.713000) >>> 0.713000 0.000000 0.713000 ( 0.712000) >>> >>> $ jruby --server bench/bench_fib_recursive.rb 5 35 >>> 0.772000 0.000000 0.772000 ( 0.742000) >>> 0.624000 0.000000 0.624000 ( 0.624000) >>> 0.621000 0.000000 0.621000 ( 0.621000) >>> 0.622000 0.000000 0.622000 ( 0.622000) >>> 0.622000 0.000000 0.622000 ( 0.621000) >>> > _______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev