Re: [gem5-dev] parallelizing gem5

Nilay Sun, 03 Feb 2013 11:16:58 -0800

The netperf test is running. It was good that Ali brought up this email
from 5-6 years back. The very first error I encountered was due to the
decode cache being shared. For the time being I have taken the easy way of
making the cache a per decoder object, instead of it being shared amongst
amongst the decoders. Apart from that I really did not face any problems.


The etherlink object was provided with the sim objects that are at its two
ends. The etherlink object, when required to schedule the event that moves
a packet from one end to the other, schedules the event  using the sim
object at the other end. The code in event queue class takes the decision
for the right queue on which to schedule this event.

One thing that seems not to be working correctly right now is automatic
deletion of the per queue events that form a global event. We can handle
it in course of time.

I think it is working fine.

--
Nilay

On Fri, February 1, 2013 3:14 pm, Steve Reinhardt wrote:
> Forwarding this thread to the dev list since others in the community might
> be interested (both in the technical issues and in the fact that someone
> is
> working in this direction).
>
> Steve
>
>
> On Fri, Feb 1, 2013 at 9:01 AM, Steve Reinhardt <ste...@gmail.com> wrote:
>
>> Glad you found this, Ali.
>>
>> As far as the decode cache: a great big lock is certainly adequate just
>> to
>> bring things up and get it working.  In the longer term, we should take
>> advantage of the fact that the decode cache is read-mostly (or at least
>> it
>> should be... if it's not we have bigger problems) to do something more
>> intelligent.  I'm guessing it would be possible to make the decode cache
>> lock-free using cmpxchg; if not, some sort of medium-grain
>> multiple-reader-single-writer locking scheme could also work.  But those
>> optimizations should be left for later; I just wanted to bring them up
>> now
>> for the record while I was thinking of them.  In particular, I think
>> making
>> the decode cache per-thread is the wrong way to go.
>>
>> Steve
>>
>>
>>
>> On Fri, Feb 1, 2013 at 8:31 AM, Ali Saidi <sa...@umich.edu> wrote:
>>
>>> **
>>>
>>> Hi Nilay,
>>>
>>>
>>>
>>> I finally found  an email which I've been looking for since the last
>>> email you sent about running multiple systems in gem5. This undegrad
>>> named
>>> Miles got two systems running in gem5 (in 2007). None of the diffs are
>>> useful at this point, everything has changed, but in the process he did
>>> identify the areas that he had to lock around to make multiple systems
>>> work. I'm not sure if you've gotten past this point yet, but there are
>>> the
>>> areas he identified and "fixed." The fix was just a great-big-lock
>>> around
>>> each of them which for the decode cache really hurt performance.
>>>
>>> FastAlloc: gone,  so no problem and tcmalloc at least in thread-safe
>>>
>>> RefCount: I'm not sure if this is still a problem or not. If the
>>> pointers
>>> you're going to exchange are reference counted they could be. Certainly
>>> another issue (see below) is refcounting of instructions. This might be
>>> the
>>> biggest reason to more toward c++11 pointers. Miles ended up using gcc
>>> intrinsics (__atomic_compare_and_exchange() on the incref/decref
>>> members,
>>> although there are now C++::atomic_add and __atomic_fetch_and_add()
>>> which
>>> is probably more useful that having to write a while loop for the comp
>>> and
>>> exchange.)
>>>
>>> Stream output (e.g. DPRINTFS from multiple threads)
>>>
>>> Decode Cache: since in can be shared cross threads (perhaps it
>>> shouldn't
>>> be, or maybe it should be), and the stl structures aren't threadsafe by
>>> default.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Ali
>>>
>>>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] parallelizing gem5

Reply via email to