Anirudh, 

  You might want to take a look at the SST project here at Sandia
(http://code.google.com/p/sst-simulator/).  We've incorporated GeM5 into a
parallel discrete event framework and run it to a few hundred nodes. We
use a latency based conservative optimization similar to what you describe
(i.e. fixed lookahead), but the lookahead is bounded by the minimum
latency between nodes, so it is deterministic. This is all layered over
MPI, and the performance seems reasonable (~80% scaling efficiency).
(Note, we went over MPI instead of threading for scalability reasons. We
eventually want to look at several thousands of nodes, and we don't have
easy access to a large enough multithreaded machine. Our early tests seem
to indicate that there is not a huge performance difference for our use
case.). Currently, we have a NIC based on the Portals API connected to a
model of the router used in the Cray XT3 series or routers from the
Georgia Tech IRIS simulator.

  The current version of SST hangs a network interface off the memory bus
(see attached picture), but we are working on a more generic version which
would allow you to run each CPU on a different MPI rank to improve socket
simulation speeds. 



Thanks,

arun

On 3/29/12 10:47 AM, "Anirudh Sivaraman" <[email protected]> wrote:

>I have a design for a parallel version of GEM5. I wanted to run it by
>the dev list before jumping in. The idea is to simulate a networked
>system of multiple machines. The networking simulation will be handled
>by ns3, a standard networking simulator. Each GEM5 instance will
>connect into ns3 using a tap device (I hope to use ethertap.cc for
>this) and ns3 will act as a "router" forwarding packets between GEM5
>instances. Each machine will be simulated by it's own GEM5 instance in
>a separate thread and will hook into ns-3 using a tap device (ns-3 has
>some support for this). ns3 is pretty flexible and can simulate
>wired/wireless networks, but that should hopefully not matter to GEM5.
>
>The natural question is handling synchronization between the simulated
>times in the various GEM5 instances. My idea is to use barrier
>synchronization between the various GEM5 instances at periodic time
>intervals. Let's assume this time interval is 10 ms. Then each GEM5
>instance runs from 0 through 10 ms of simulated time, and then waits
>until all other GEM5 instances have finished their 10 ms slice as
>well. The process then repeats itself from simulated time 10 to 20 ms.
>Consequently, instances don't get out of sync by more than 10 ms at
>any point. This interval is tunable, a lower interval gives you more
>accuracy but more run time as well.
>
>I realize that determinism is impossible in this framework, but that's
>a hit I am willing to take for my work. I wanted to know if there were
>any code examples on using ethertap.cc just like etherlink.cc
>(twosys-tsunami-simple-atomic.py)
>
>Anirudh
>_______________________________________________
>gem5-dev mailing list
>[email protected]
>http://m5sim.org/mailman/listinfo/gem5-dev
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to