On Oct 4, 2009, at 8:11 AM, Stijn Souffriau wrote: > Dear developers, > > I'm a senior computer science student at Ghent University starting > work > on my master's thesis. I'm working on the simulation of many-core > systems together with researchers from the department of electronics > and > information systems. Know that by many cores I mean hundreds of cores. > We think we will be able to simulate these cores by using two > techniques. One is non cycle accurate core simulation, based on > interval > simulation, and the other is parallelization. The former is pretty > much > taken care of, the latter is what I will be working on. You could > say my > job is to scale the simulator up to many cores and address any > accuracy > and synchronization problems along the way. Prior work has been > implemented as an interval simulating CPU in the m5-1.1 simulator. We > plan on implementing future work in version 2 of m5. At this early > stage > I'm faced with several questions mostly concerning the modifiability > of > m5 and this is why I'm writing you. I have absolutely no experience in > coding m5, yet, which is why I would appreciate your thoughts on this > matter. This is great! It's something that we've slowly been working towards. Nate is also probably going to chime in, but I'll tell you a bit about what we've been thinking.
> I'm trying to figure out how generically I can parallelize m5. Ideally > all the components or groups of components (SimObjects) suited to > run in > parallel will be driven by their own clock. The main challenge > might be > to adapt components so that they can communicate asynchronously. When > this is taken care of we can move on to implementing channels which > will > facilitate the asynchronous communication between the components. > These > channels will mainly protect against concurrency and serialize the > requests from asynchronously running SimObjects (e.g. CPU cores) to a > shared resource (e.g. L2 D-cache) whenever needed. When this is > finished > it would just be a matter of assigning work to threads, scheduling > them > and allocating memory (easy if shared, difficult if distributed) in > the > core of m5. We've recently started down this route. The two changes that have been implemented so far are a new configuration system in Python than supports inheritance and then using that configuration system to set a pointer to an event queue in which the object should schedule all its events. Currently all the objects schedule their own events, however not all SimObjects support the new coherence system yet. The three big pieces that are missing are how should threads cross- schedule events on other threads queues (for communication between resources assigned to different queues), how different threads should be kept in sync, and the building of an interconnect that supported dealing with communicating objects on different threads. Depending on the number of events that were flowing between the threads the way they communicate is very important. Ideally some sort of lock free data structure could be used for this. On of our goals was that threaded simulation and single-thread simulation should provide exactly the same result in which case events between threads must be scheduled on the exact same cycle in both cases. If there is a large delay (in simulated time) between the two thread domains this is not too bad, however if there is a short delay it's not clear how this can be done effectively yet. Finally, we had envisioned new thread-aware interconnect objects which would do the right thing to pass events between threads. With various hacks a summer student at Michigan had made some progress on running two different systems in the same simulation process on different threads, but the implementation was less than ideal. However, the two systems running at the same time is a good initial goal and can be used to test the sensitivity of the threads/ synchronization to the size of the quantum of simulation. Additionally, an ethernet link could have a reasonable latency and would probably make for a good place to first try out communicating between to threads (each representing a system). > I've read some of the m5 version 2 documentation and code, and it > seems > that quite some effort has already been put in facilitating > asynchronous > communication between components (cf. memory system). Yet primarily > for > reasons of simplification, rather then for the sake parallelization. > If > m5 is consistently designed in this way then parallelizing it could be > fairly simple. For example, it would just be a matter of > implementing a > layer (some sort of proxy object) between ports to facilitate the > asynchronous communication in the memory hierarchy but the MemObjects > themselves could remain untouched. Furthermore since all memory > systems > use the port interface this could be done very generically. My main > question is then if there are still SimObjects in m5 v2 which don't > communicate in such a generic event based manner? Maybe some > subsystems > will fail when called upon asynchronously and maybe I'm even > overlooking > some other serious issues. I also get the feeling that v2 is miles > ahead > of v1.1 in this area. Our timing memory system does support asynchronous communication because most real-world memory systems do. M5 v2.0 is several miles ahead of v1.1. All objects in the memory system support both the atomic and timing mode accesses. These inherit from MemObject which inherits from SimObject. There are SimObjects that don't communicate through events, however it's doubtful that you would ever want one of them in an different thread. These are things like TLBs and interrupt controllers which are pretty much welded to the CPU that they're responsible for. I think other people will probably be best to answer the rest of your questions. Ali _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
