I had forgotten about that... what kind of locking did he use on the decode
cache?  I still think that a shared decode cache is the right way to go;
given that all we ever do is add instructions (I don't think we ever delete
anything) then it might not be hard to do a lock-free structure.  If we do
more measurements we'll also have to make sure to take them at steady state,
as I'm sure there will be a lot of contention on the decode cache during
startup.

I agree with Nate, FastAlloc should definitely either be replicated or
abandoned.

Steve

On Mon, Oct 19, 2009 at 1:33 PM, Ali Saidi <[email protected]> wrote:

>
> Well, we don't actually have no data. Remember, Miles had multiple
> independent machines working in M5 at one point. If I remember the profiles
> correctly a huge amount of time was spent contending for locks in the
> decode cache and the FastAlloc code.
>
> Ali
>
>
> On Mon, 19 Oct 2009 13:05:21 -0700, Steve Reinhardt <[email protected]>
> wrote:
> > On Mon, Oct 19, 2009 at 12:46 PM, nathan binkert <[email protected]>
> wrote:
> >
> >>
> >> > StaticInst::decode() Perhaps replication would be a good place to
> >> > start.
> >> I
> >> > think the structure is just accessed too much to have any kind of
> >> locking.
> >> I agree that replication can work, but I'd say we start with a rwlock
> >> and then replicate once we get a running system.
> >>
> >
> > I'm not convinced that replication is a win here... my expectation is
> that
> > this is a read-mostly structure; once the simulation is well underway I
> > think there would be very few new static instructions encountered.  I
> don't
> > know how large this thing gets; probably not large enough to matter for
> > physical memory capacity, but quite possibly large enough that having
> > multiple copies in a shared L3 cache could be detrimental.  It could also
> > possibly benefit from a prefetching effect if one thread is decoding
> ahead
> > of the others.  (Though if the simulated cores are close to lock-step on
> > the
> > same piece of code, then there probably won't be much of a win.)
> >
> > It's easy to speculate with no data, but I vote for starting with a
> rwlock
> > and seeing how it goes and then only moving toward replication if it's a
> > clear bottleneck in a long-running simulation.
> >
> > Steve
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to