> execution models to share instruction code, but splitting L2 data
> across cores is bound to be a destructive use of the cache in any
> data parallel model.  Obviously, user control of the cache is a large

"data parallel model" basically means you're streaming in/out of dram,
right?  why are these cases not nicely covered by the placement 
instructions implemented in mmx and followons?  you can control 
how a load or store behaves wrt different levels of cache cache.  
IIRC, Intel introduced some new stuff to make the cache shared 
by cores more effective this way (per-core victim traffic writes through?)

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to