nathan binkert wrote:
> Hi Stijn,
>
> I finally got around to reading this thread on parallelizing M5.  It
> seems that you have a reasonably good handle on the issues involved.
> Just so we're clear, the plan all along has been to parallelize M5.
> Ali mentioned the change I made about a year ago, where I gave all
> objects a private pointer to an event queue that it would schedule on.
>  The idea is that each object would be bound to a particular event
> queue and would schedule events on that queue.  Event queues would be
> bound to host cores.  Only a few objects would schedule events on
> multiple queues.  We are currently planning to only support shared
> memory machines and aren't planning to use a cluster to simulate a
> system.
>
>   

Neither am I. Maybe in a distant future.

>> The thing is that a quantum of 40 cycles, is probably not going to be
>> sufficient for the simulations I will be running.
>>     
> What do you base this assertion on?  You said that you'd be running
> hundreds of cores.  This means that you're going to have 10s of
> simulated cores mapped onto every real core.  I wouldn't jump to
> conclusions on this before we have any idea how things perform.
>
>   

As I said in my earlier mails I'm not going to be using detailed CPU
simulation but a CPU which will be simulating intervals between miss
events. This type of simulation is an order of magnitude faster then
detailed CPU simulation. So I'm "guessing" that a CPU, when only working
on local resources, can run ahead hundreds of cycles. If there is a
quantum of 40 then it's going to be blocked most of the time when
accessing a shared resource. My other guess is that the inherent
parallelism can be exploited sufficiently so that I'm not even going to
need a quantum where as quantum based simulators usually don't exploit
this and just synchronize every 'quantum' cycles. I'm also assuming that
the amount of simulated cores will roughly equal the amount of physical
cores. If this isn't the case it doesn't really matter how many times
they block since all the processors can always do good work (not
accounting for the context switch overhead).

>> I'm guessing that simulation threads, if I were to let them run freely, 
>> would frequently be
>> hundreds of cycles apart.
>>     
> Sure, but the real question is, how much overhead are you adding by
> keeping things more closely in sync.
>
>   

Not if you have about as many physical processors as simulated ones then
the question is how much time are some processors blocked. I know there
aren't any CPU's out there that can run 1000 threads. But what if you
there were, then this would matter. What if you could run m5 on a GPGPU.
Of course i will try to keep overhead and context switches to a minimum.
>> The problem with this is that events would have to be able to say which
>> state objects they will be affecting (= on which eventqueues they will be
>> scheduling events), requiring further code changes.
>>     
> This is already how we are planning on doing things and the SimObject
> hierarchy lends itself nicely to this.
>   
Great! If you have any specific ideas I will be glad to hear them. I
plan to dive into the code soon. I will mail my ideas to the list when i
have a concrete vision and I would appreciate your advice.



>   Nate
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to