> execution models to share instruction code, but splitting L2 data > across cores is bound to be a destructive use of the cache in any > data parallel model. Obviously, user control of the cache is a large
"data parallel model" basically means you're streaming in/out of dram, right? why are these cases not nicely covered by the placement instructions implemented in mmx and followons? you can control how a load or store behaves wrt different levels of cache cache. IIRC, Intel introduced some new stuff to make the cache shared by cores more effective this way (per-core victim traffic writes through?) _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
