On Fri, Feb 26, 2010 at 12:44 PM, Stijn Souffriau <[email protected]> wrote: > > To be fair, when I said disproves your hypothesis, I should have said that > the assumption that synchronizing every couple of cycles would work fast is > wrong and that high accuracy can still be obtained with much less > synchronization. > > One of the questions I'm hoping to answer is what the speedup/accuracy > difference is between a sluggish detailed simulator and a very fast not so > detailed simulator. > > See: > > Slacksim > Graphite
Thanks for the links (they did come through for me the first time). I think the Slacksim paper is most relevant (since they're not trying to parallelize across machines), and Figure 11 is pretty much in line with what I expected: the Q10 scheme (which is roughly what I had in mind when I was talking about "conservative simulation") does give good speedups (for my definition of "good" in my earlier message). It's also true that relaxing the synchronization gives even better speedups... not a surprise, and not something I was disputing. Note that the difference is smallest when you have a higher ratio of simulated cores to host cores; they see a bigger gap at 8 host cores in part because they are only simulating an 8-core target. I'd be very interested in what those graphs look like for a 64-core target; the benefit of relaxing synchronization for 8 host cores will be smaller. Another consideration that these papers seem to completely overlook is that conservative simulations can be made deterministic (not always easy, but doable), while these "under-synchronized" simulations are inherently non-deterministic. That has implications both on the stability of results and on debuggability. On a more practical note, I don't think the Slacksim design is that far off from (or at least not incompatible with) what we were discussing for timing mode. The difference is more one of software engineering; what we would want to do is take some of that mechanism and bury it inside the bus model, and also decouple threads from objects. (I'm really surprised that they have one Pthread per target core (see p. 4), that's a lot of needless overhead when you're doing 8 target cores on 2 host cores.) I'll try and sketch out how I'd do a Slacksim-like model in M5... it'll take a bit, so I'll send that in a separate message later. Steve _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
