Re: [gem5-dev] parallelizing gem5

Andreas Hansson Mon, 04 Feb 2013 00:29:47 -0800

Hi Nilay,

Great work indeed! All we need now is a model of an ethernet switch for
some serious-scale parallelism :-)


Andreas

On 03/02/2013 22:53, "Steve Reinhardt" <ste...@gmail.com> wrote:

>Excellent!  Congratulations!
>
>Steve
>
>
>On Sun, Feb 3, 2013 at 11:16 AM, Nilay <ni...@cs.wisc.edu> wrote:
>
>> The netperf test is running. It was good that Ali brought up this email
>> from 5-6 years back. The very first error I encountered was due to the
>> decode cache being shared. For the time being I have taken the easy way
>>of
>> making the cache a per decoder object, instead of it being shared
>>amongst
>> amongst the decoders. Apart from that I really did not face any
>>problems.
>>
>> The etherlink object was provided with the sim objects that are at its
>>two
>> ends. The etherlink object, when required to schedule the event that
>>moves
>> a packet from one end to the other, schedules the event  using the sim
>> object at the other end. The code in event queue class takes the
>>decision
>> for the right queue on which to schedule this event.
>>
>> One thing that seems not to be working correctly right now is automatic
>> deletion of the per queue events that form a global event. We can handle
>> it in course of time.
>>
>> I think it is working fine.
>>
>> --
>> Nilay
>>
>> On Fri, February 1, 2013 3:14 pm, Steve Reinhardt wrote:
>> > Forwarding this thread to the dev list since others in the community
>> might
>> > be interested (both in the technical issues and in the fact that
>>someone
>> > is
>> > working in this direction).
>> >
>> > Steve
>> >
>> >
>> > On Fri, Feb 1, 2013 at 9:01 AM, Steve Reinhardt <ste...@gmail.com>
>> wrote:
>> >
>> >> Glad you found this, Ali.
>> >>
>> >> As far as the decode cache: a great big lock is certainly adequate
>>just
>> >> to
>> >> bring things up and get it working.  In the longer term, we should
>>take
>> >> advantage of the fact that the decode cache is read-mostly (or at
>>least
>> >> it
>> >> should be... if it's not we have bigger problems) to do something
>>more
>> >> intelligent.  I'm guessing it would be possible to make the decode
>>cache
>> >> lock-free using cmpxchg; if not, some sort of medium-grain
>> >> multiple-reader-single-writer locking scheme could also work.  But
>>those
>> >> optimizations should be left for later; I just wanted to bring them
>>up
>> >> now
>> >> for the record while I was thinking of them.  In particular, I think
>> >> making
>> >> the decode cache per-thread is the wrong way to go.
>> >>
>> >> Steve
>> >>
>> >>
>> >>
>> >> On Fri, Feb 1, 2013 at 8:31 AM, Ali Saidi <sa...@umich.edu> wrote:
>> >>
>> >>> **
>> >>>
>> >>> Hi Nilay,
>> >>>
>> >>>
>> >>>
>> >>> I finally found  an email which I've been looking for since the last
>> >>> email you sent about running multiple systems in gem5. This undegrad
>> >>> named
>> >>> Miles got two systems running in gem5 (in 2007). None of the diffs
>>are
>> >>> useful at this point, everything has changed, but in the process he
>>did
>> >>> identify the areas that he had to lock around to make multiple
>>systems
>> >>> work. I'm not sure if you've gotten past this point yet, but there
>>are
>> >>> the
>> >>> areas he identified and "fixed." The fix was just a great-big-lock
>> >>> around
>> >>> each of them which for the decode cache really hurt performance.
>> >>>
>> >>> FastAlloc: gone,  so no problem and tcmalloc at least in thread-safe
>> >>>
>> >>> RefCount: I'm not sure if this is still a problem or not. If the
>> >>> pointers
>> >>> you're going to exchange are reference counted they could be.
>>Certainly
>> >>> another issue (see below) is refcounting of instructions. This
>>might be
>> >>> the
>> >>> biggest reason to more toward c++11 pointers. Miles ended up using
>>gcc
>> >>> intrinsics (__atomic_compare_and_exchange() on the incref/decref
>> >>> members,
>> >>> although there are now C++::atomic_add and __atomic_fetch_and_add()
>> >>> which
>> >>> is probably more useful that having to write a while loop for the
>>comp
>> >>> and
>> >>> exchange.)
>> >>>
>> >>> Stream output (e.g. DPRINTFS from multiple threads)
>> >>>
>> >>> Decode Cache: since in can be shared cross threads (perhaps it
>> >>> shouldn't
>> >>> be, or maybe it should be), and the stl structures aren't
>>threadsafe by
>> >>> default.
>> >>>
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Ali
>> >>>
>> >>>
>>
>> _______________________________________________
>> gem5-dev mailing list
>> gem5-dev@gem5.org
>> http://m5sim.org/mailman/listinfo/gem5-dev
>>
>_______________________________________________
>gem5-dev mailing list
>gem5-dev@gem5.org
>http://m5sim.org/mailman/listinfo/gem5-dev
>


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] parallelizing gem5

Reply via email to