> Also, NUMA effects are more important in practice on big multicores. Some > of the off-chip delays are brutal.
yeah, we've been talking about this on #cat-v. even inside one CPU package amd puts multiple dies nowadays, and the cross-die cpu cache access delays are approaching the same dimensions as memory-access! also on each die, they have what they call ccx (cpu complex), groupings of 4 cores, which are connected much faster internally than towards the other ccx