Obviously we need to benchmark with caching disabled. But I will do that as part of Caliper integration. There could be benchmarks which increased only due to the caching. A clean layer for caching these derived attributes can help a lot with higher level algorithms.
Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. On Sat, May 4, 2013 at 10:32 PM, Robin Anil <[email protected]> wrote: > Caching layer would trivially give us a lot of benefit, especially for > repeat calls. I think thats a very low hanging fruit. > > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > > > On Sat, May 4, 2013 at 10:31 PM, Robin Anil <[email protected]> wrote: > >> Here is the overall speedup so far since 0.7 >> >> https://docs.google.com/spreadsheet/ccc?key=0AhewTD_ZgznddGFQbWJCQTZXSnFULUYzdURfWDRJQlE#gid=3 >> >> I back-ported the current benchmark code (Disabled SerializationBenchmark >> which doesn't seem to work with 0.7) and ran against 0.7. >> There is one regression. But rest have been pretty positive. >> >> >> On Sun, Apr 21, 2013 at 12:37 PM, Ted Dunning <[email protected]>wrote: >> >>> On Sun, Apr 21, 2013 at 10:27 AM, Dan Filimon >>> <[email protected]>wrote: >>> >>> > > But multi-threaded assign would be very dangerous. Even if you >>> assign >>> > > different parts of the vector to different threads, you have to worry >>> > about >>> > > cache line alignment which is generally not visible to Java without >>> very >>> > > special effort. >>> > > >>> > >>> > I'm terrible at explaining what I mean. >>> > So, rather than have the threads assign chunks of a vector (which would >>> > only really work if the underlying Vector was an array of doubles), >>> each >>> > thread would return an OrderedIntDoubleMapping, and they would be >>> merged >>> > into a Vector by a single thread at the end. >>> > >>> > I wonder, even talking about cache alignment worries in Java makes me >>> > wonder whether we'd be trying to outwit the JVM. It feels kind of >>> wrong, as >>> > I'm certain that the people writing Hotspot are better at optimizing >>> the >>> > code than me. :) >>> > >>> >>> Yeah... the single thread updater is pretty much what I meant when I >>> talked >>> about a threaded map-reduce. Everybody produces new data in parallel and >>> then a single thread per vector makes it all coherent. >>> >>> This is actually kind of similar to the way that very large HPC matrix >>> libraries work. Message passing can be more efficient as an idiom for >>> communicating data like this than shared memory even when efficient >>> shared >>> memory is available. >>> >> >> >
