2010/5/3 Martin Buchholz <marti...@google.com> > It's a coding style made popular by Doug Lea. > It's an extreme optimization that probably isn't necessary; > you can expect the JIT to make the same optimizations. >
It certainly is necessary - unfortunately. Testing my particle/octree-based 3D renderer without this manual optimization (dumping FPS performance each 100 frames, begin at 10th score after startup): JDK 6u21-b03, Hotspot Client: 159.4896331738437fps 161.29032258064515fps 158.73015873015873fps 160.0fps 159.23566878980893fps JDK 6u21-b03, Hotspot Server: 197.23865877712032fps 204.91803278688525fps 196.07843137254903fps 200.40080160320642fps 198.01980198019803fps Now let's cache 8 instance variables into local variables (most final, a couple non-final ones too): JDK 6u21-b03, Hotspot Client: 169.4915254237288fps 172.1170395869191fps 168.63406408094434fps 168.0672268907563fps 170.64846416382252fps JDK 6u21-b03, Hotspot Server: 197.62845849802372fps 200.40080160320642fps 196.8503937007874fps 199.6007984031936fps 203.2520325203252fps So, the manual optimization makes no difference for Hotspot Server; but hell it does for Client - 6% better performance in this test; and the test is not only the complex, deeply nested rendering loops that use those cacheable variables to read the input data and update the output pixel and Z buffers - there's also other code that burns significant CPU and doesn't use these variables, remarkably buffer filling and copying steps. This means the speedup in the optimized code should be much higher than 6%, I only reported / cared to measure the application's global performance. We'll need to deal with HotSpot Client for years to come, not to mention smaller platforms (JavaME, JavaFX Mobile&TV) which JIT compilers are even lesser than JavaSE's C1. Tuned bytecode is also faster to interpret, which benefits warm-up time too. Please keep your dirty purist hands off the API code that Doug and others micro-optimized; it is necessary. :) And my +1 to add the same opts to other perf-critical APIs. Even most important for java.nio as under C1, it doesn't currently benefit from intrinsic compilation of critical DirectBuffer methods. A+ Osvaldo > (you can try to check the machine code yourself!) > Nevertheless, copying to locals produces the smallest > bytecode, and for low-level code it's nice to write code > that's a little closer to the machine. > > Also, optimizations of finals (can cache even across volatile > reads) could be better. John Rose is working on that. > > For some algorithms in j.u.c, > copying to a local is necessary for correctness. > > Martin > > On Mon, May 3, 2010 at 04:40, Ulf Zibis <ulf.zi...@gmx.de> wrote: > > Hi, > > > > in class String I often see member variables copied to local variables. > > In java.nio.Buffer I don't see that (e.g. for "position" in > nextPutIndex(int > > nb)). > > Now I'm wondering. > > > > From JMM (Java-Memory-Model) I learned, that jvm can hold non-volatile > > variables in a cache for each thread, so e.g. even in CPU register for > few > > ones. > > From this knowing, I don't understand, why doing the local caching > manually > > in String (and many other classes), instead trusting on the JVM. > > > > Can anybody help me in understanding this ? > > > > -Ulf > > > > > > >