Dear developers,

I'm a senior computer science student at Ghent University starting work
on my master's thesis. I'm working on the simulation of many-core
systems together with researchers from the department of electronics and
information systems. Know that by many cores I mean hundreds of cores.
We think we will be able to simulate these cores by using two
techniques. One is non cycle accurate core simulation, based on interval
simulation, and the other is parallelization. The former is pretty much
taken care of, the latter is what I will be working on. You could say my
job is to scale the simulator up to many cores and address any accuracy
and synchronization problems along the way. Prior work has been
implemented as an interval simulating CPU in the m5-1.1 simulator. We
plan on implementing future work in version 2 of m5. At this early stage
I'm faced with several questions mostly concerning the modifiability of
m5 and this is why I'm writing you. I have absolutely no experience in
coding m5, yet,  which is why I would appreciate your thoughts on this
matter.

I'm trying to figure out how generically I can parallelize m5. Ideally
all the components or groups of components (SimObjects) suited to run in
parallel will be driven by their own clock.  The main challenge might be
to adapt components so that they can communicate asynchronously. When
this is taken care of we can move on to implementing channels which will
facilitate the asynchronous communication between the components. These
channels will mainly protect against concurrency and serialize the
requests from asynchronously running SimObjects (e.g. CPU cores) to a
shared resource (e.g. L2 D-cache) whenever needed. When this is finished
it would just be a matter of assigning work to threads, scheduling them
and allocating memory (easy if shared, difficult if distributed) in the
core of m5.

I've read some of the m5 version 2 documentation and code, and it seems
that quite some effort has already been put in facilitating asynchronous
communication between components (cf. memory system). Yet primarily for
reasons of simplification, rather then for the sake parallelization. If
m5 is consistently designed in this way then parallelizing it could be
fairly simple. For example, it would just be a matter of implementing a
layer (some sort of proxy object) between ports to facilitate the
asynchronous communication in the memory hierarchy but the MemObjects
themselves could remain untouched. Furthermore since all memory systems
use the port interface this could be done very generically. My main
question is then if there are still SimObjects in m5 v2 which don't
communicate in such a generic event based manner? Maybe some subsystems
will fail when called upon asynchronously and maybe I'm even overlooking
some other serious issues. I also get the feeling that v2 is miles ahead
of v1.1 in this area.

If all these conditions were met then only a few more changes would have
to be made to the main simulation loop. We would have to create threads,
assign threads to evenqueue processing loops, assign eventqueues to
SimObjects and insert objects, as described above, between the ports of
SimObjects to handle the communication. What are you feelings about
openMP as a language for coding this? I'm hoping openMP will take most
of the fine grained optimization work of my shoulders.

Another concern of mine is the memory interconnection network. The cores
in many-core processors will definitely not be chatting with each other
over a shared bus but over a more advanced interconnection network. How
far along is the implementation of these interconnection networks?

At this point I am unaware of any efforts to parallelize m5 apart from a
project idea for Google summer of code from 2008. Have any efforts been
made? If so, it might be possible synchronize our efforts. If not, I
would appreciate your thoughts on how to best approach this. Know that
it is not my intention to parallelize the whole of m5. The main
objective is to be able to simulate several cores, the interconnection
network and the memory hierarchy in parallel. However I would like to do
this as generically as possible so that we aren't confined to the
parallel simulation of one configuration.

Eagerly awaiting your reply,

Stijn Souffriau

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to