Hello again,

It's been a while since my last email and I've made some progress in getting
parts of m5 to run in parallel but I've reached a critical phase and have some
questions of which I'm sure some of the people reading will be able to help me
with.

My main objective has always been to get CPU cores and their private caches to
get simulated in parallel as well as the rest of the shared memory. The
easiest way to do this is to place an element in between port interfaces that
handles concurrency by basically forwarding member calls to the port on one
side to the peer-port on the other side by means of a remotely processed
event. Kind of like a remote procedure call.

The problem is that these procedures have a return value, with the exception
of functional calls. The only way to generically maintain consistency is to
block the call on one thread and wait for the return value from the other
thread. However if two calls from opposing ports are executed at the same time
you will get a deadlock. This clearly is a big problem since in general you
can't interrupt one of the calls.

This makes parallel atomic calls pretty much impossible between private caches
unless I rewrote the port interface and the implementing classes or had some
guarantee that the state of the objects remained consistent if for example one
atomic call got executed while another was blocking.

Timing calls are interesting. They return a value as well but it only signals
point-to-point acceptance. So in theory, in case of a deadlock I could simply
return false and send a retry a few ticks later after which the call would
start over. However the semantics of "false" and it's effect on the simulation
are unclear to me. I would like to know if this could have an effect on the
accuracy or even functional correctness? Maybe it would even be possible to
return true when a deadlock is detected and handle the retry separately in
case the remote end would return false, this would be more efficient.

If parallelizing timing calls is also impossible this way then I'm going to
have to recode some large, complicated chunks of m5 so I'm hoping it won't
come to that.

Assuming that the former problem has been solved the question remains if parts
of the memory system can even safely run concurrently since I'm guessing
pointers to data are shared in between MemObjects. In theory the cache
coherence protocol should prohibit concurrent, incoherent read/writes but I
don't know the code that well.

thanks in advance,

Stijn

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to