On Fri, Nov 30, 2012 at 02:00:03PM +0000, Lux, Jim (337C) wrote: > Yes. The cache essentially serves as a smart virtual memory. I suppose > that a question might be what is the optimum granularity of that I cache.. > And would there be a better architecture for that cache vs the data > cache, that allowed more/faster access because you KNOW that the dynamics > of access are different.
Let's say you want to maximize on-die SRAM-type memory for a given budget of transistors (so that you, say, get 80-90% total die yield, so you can go wafer-scale integration to reduce costs (dead dies are left in place, and routed around) and increase on-wafer mesh/torus throughput (due to much smaller geometries)). You can handle object(method) swapping to external, slower memory via the OS. As that memory is mapped into the address space (effectively being a very large register file, or a zero page of 65xx) you could also explicitly allocate it from within the code, at least as a preference, and use garbage collection if explicit deallocation is a problem. Notice that relativistic latency alone across a 300 mm wafer is 1-2 ns (aka the fabled Grace Hopper 30 cm nanosecond), so for cache coherency you'd sacrifice a lot of time, even for that small area of silicon real estate. Notice that ARM or SHARC-like cores are very fat if compared to minimalistic designs like GA144 http://www.designspark.com/blog/hands-on-with-a-144-core-processor The journey goes even farther with http://low-powerdesign.com/sleibson/2011/09/04/the-return-of-magnetic-memory-a-review-of-the-mram-panel-at-the-flash-memory-summit/ if combined with http://apl.aip.org/resource/1/applab/v86/i1/p013502_s1?bypassSSO=1 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=4387374&contentType=Conference+Publications This is fully static FPGA but without crossbar, using only locally connected cells which are fully reconfigurable, also at runtime. There is no reason why this wouldn't work in 3d, a la serially deposited multilayer, or even full volume 3d integration. > Or, is the generic approach actually better in the long run, because it's > less specialized and therefore less dependent on clever compiler and > coding tricks to optimize performance. _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
