Well, we don't actually have no data. Remember, Miles had multiple independent machines working in M5 at one point. If I remember the profiles correctly a huge amount of time was spent contending for locks in the decode cache and the FastAlloc code.
Ali On Mon, 19 Oct 2009 13:05:21 -0700, Steve Reinhardt <[email protected]> wrote: > On Mon, Oct 19, 2009 at 12:46 PM, nathan binkert <[email protected]> wrote: > >> >> > StaticInst::decode() Perhaps replication would be a good place to >> > start. >> I >> > think the structure is just accessed too much to have any kind of >> locking. >> I agree that replication can work, but I'd say we start with a rwlock >> and then replicate once we get a running system. >> > > I'm not convinced that replication is a win here... my expectation is that > this is a read-mostly structure; once the simulation is well underway I > think there would be very few new static instructions encountered. I don't > know how large this thing gets; probably not large enough to matter for > physical memory capacity, but quite possibly large enough that having > multiple copies in a shared L3 cache could be detrimental. It could also > possibly benefit from a prefetching effect if one thread is decoding ahead > of the others. (Though if the simulated cores are close to lock-step on > the > same piece of code, then there probably won't be much of a win.) > > It's easy to speculate with no data, but I vote for starting with a rwlock > and seeing how it goes and then only moving toward replication if it's a > clear bottleneck in a long-running simulation. > > Steve _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
