Dear developers, I'm a senior computer science student at Ghent University starting work on my master's thesis. I'm working on the simulation of many-core systems together with researchers from the department of electronics and information systems. Know that by many cores I mean hundreds of cores. We think we will be able to simulate these cores by using two techniques. One is non cycle accurate core simulation, based on interval simulation, and the other is parallelization. The former is pretty much taken care of, the latter is what I will be working on. You could say my job is to scale the simulator up to many cores and address any accuracy and synchronization problems along the way. Prior work has been implemented as an interval simulating CPU in the m5-1.1 simulator. We plan on implementing future work in version 2 of m5. At this early stage I'm faced with several questions mostly concerning the modifiability of m5 and this is why I'm writing you. I have absolutely no experience in coding m5, yet, which is why I would appreciate your thoughts on this matter.
I'm trying to figure out how generically I can parallelize m5. Ideally all the components or groups of components (SimObjects) suited to run in parallel will be driven by their own clock. The main challenge might be to adapt components so that they can communicate asynchronously. When this is taken care of we can move on to implementing channels which will facilitate the asynchronous communication between the components. These channels will mainly protect against concurrency and serialize the requests from asynchronously running SimObjects (e.g. CPU cores) to a shared resource (e.g. L2 D-cache) whenever needed. When this is finished it would just be a matter of assigning work to threads, scheduling them and allocating memory (easy if shared, difficult if distributed) in the core of m5. I've read some of the m5 version 2 documentation and code, and it seems that quite some effort has already been put in facilitating asynchronous communication between components (cf. memory system). Yet primarily for reasons of simplification, rather then for the sake parallelization. If m5 is consistently designed in this way then parallelizing it could be fairly simple. For example, it would just be a matter of implementing a layer (some sort of proxy object) between ports to facilitate the asynchronous communication in the memory hierarchy but the MemObjects themselves could remain untouched. Furthermore since all memory systems use the port interface this could be done very generically. My main question is then if there are still SimObjects in m5 v2 which don't communicate in such a generic event based manner? Maybe some subsystems will fail when called upon asynchronously and maybe I'm even overlooking some other serious issues. I also get the feeling that v2 is miles ahead of v1.1 in this area. If all these conditions were met then only a few more changes would have to be made to the main simulation loop. We would have to create threads, assign threads to evenqueue processing loops, assign eventqueues to SimObjects and insert objects, as described above, between the ports of SimObjects to handle the communication. What are you feelings about openMP as a language for coding this? I'm hoping openMP will take most of the fine grained optimization work of my shoulders. Another concern of mine is the memory interconnection network. The cores in many-core processors will definitely not be chatting with each other over a shared bus but over a more advanced interconnection network. How far along is the implementation of these interconnection networks? At this point I am unaware of any efforts to parallelize m5 apart from a project idea for Google summer of code from 2008. Have any efforts been made? If so, it might be possible synchronize our efforts. If not, I would appreciate your thoughts on how to best approach this. Know that it is not my intention to parallelize the whole of m5. The main objective is to be able to simulate several cores, the interconnection network and the memory hierarchy in parallel. However I would like to do this as generically as possible so that we aren't confined to the parallel simulation of one configuration. Eagerly awaiting your reply, Stijn Souffriau _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
