Well, we don't actually have no data. Remember, Miles had multiple
independent machines working in M5 at one point. If I remember the profiles
correctly a huge amount of time was spent contending for locks in the
decode cache and the FastAlloc code.

Ali


On Mon, 19 Oct 2009 13:05:21 -0700, Steve Reinhardt <[email protected]>
wrote:
> On Mon, Oct 19, 2009 at 12:46 PM, nathan binkert <[email protected]>
wrote:
> 
>>
>> > StaticInst::decode() Perhaps replication would be a good place to
>> > start.
>> I
>> > think the structure is just accessed too much to have any kind of
>> locking.
>> I agree that replication can work, but I'd say we start with a rwlock
>> and then replicate once we get a running system.
>>
> 
> I'm not convinced that replication is a win here... my expectation is
that
> this is a read-mostly structure; once the simulation is well underway I
> think there would be very few new static instructions encountered.  I
don't
> know how large this thing gets; probably not large enough to matter for
> physical memory capacity, but quite possibly large enough that having
> multiple copies in a shared L3 cache could be detrimental.  It could also
> possibly benefit from a prefetching effect if one thread is decoding
ahead
> of the others.  (Though if the simulated cores are close to lock-step on
> the
> same piece of code, then there probably won't be much of a win.)
> 
> It's easy to speculate with no data, but I vote for starting with a
rwlock
> and seeing how it goes and then only moving toward replication if it's a
> clear bottleneck in a long-running simulation.
> 
> Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to