I had forgotten about that... what kind of locking did he use on the decode cache? I still think that a shared decode cache is the right way to go; given that all we ever do is add instructions (I don't think we ever delete anything) then it might not be hard to do a lock-free structure. If we do more measurements we'll also have to make sure to take them at steady state, as I'm sure there will be a lot of contention on the decode cache during startup.
I agree with Nate, FastAlloc should definitely either be replicated or abandoned. Steve On Mon, Oct 19, 2009 at 1:33 PM, Ali Saidi <[email protected]> wrote: > > Well, we don't actually have no data. Remember, Miles had multiple > independent machines working in M5 at one point. If I remember the profiles > correctly a huge amount of time was spent contending for locks in the > decode cache and the FastAlloc code. > > Ali > > > On Mon, 19 Oct 2009 13:05:21 -0700, Steve Reinhardt <[email protected]> > wrote: > > On Mon, Oct 19, 2009 at 12:46 PM, nathan binkert <[email protected]> > wrote: > > > >> > >> > StaticInst::decode() Perhaps replication would be a good place to > >> > start. > >> I > >> > think the structure is just accessed too much to have any kind of > >> locking. > >> I agree that replication can work, but I'd say we start with a rwlock > >> and then replicate once we get a running system. > >> > > > > I'm not convinced that replication is a win here... my expectation is > that > > this is a read-mostly structure; once the simulation is well underway I > > think there would be very few new static instructions encountered. I > don't > > know how large this thing gets; probably not large enough to matter for > > physical memory capacity, but quite possibly large enough that having > > multiple copies in a shared L3 cache could be detrimental. It could also > > possibly benefit from a prefetching effect if one thread is decoding > ahead > > of the others. (Though if the simulated cores are close to lock-step on > > the > > same piece of code, then there probably won't be much of a win.) > > > > It's easy to speculate with no data, but I vote for starting with a > rwlock > > and seeing how it goes and then only moving toward replication if it's a > > clear bottleneck in a long-running simulation. > > > > Steve > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev >
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
