PDES in gem5 is implemented using a fairly standard quantum-based PDES approach (similar to WWT) where a barrier across all threads is enforced every N cycles. Objects (in practice, sub-trees of the object graph) can be assigned to their own event queues. Cross-queue scheduling of events is supported as long as the new events are scheduled at least N ticks into the future to ensure determinism.

To enable the PDES functionality in gem5, you need to do two things from your configuration script: 1) Per-object queue assignment (the eventq_index property) and 2) set the simulation quantum on the root object (the sim_quantum property). By default, all objects are assigned to event queue 0 and new queues are created as soon as the simulator encounters an object with a new queue ID.

You should keep in mind that very few objects support communication across parallel event queues. I have only used it to simulate multiple cores in parallel in KVM, but I think Steve has used it to simulate two parallel systems communicating over Ethernet. Since gem5 has a tendency to make unsynchronized cross-object method calls, you generally can't have devices in the same system in multiple different threads.

KVM abuses barrier synchronization in gem5 slightly, making it behave more like the kind of relaxed synchronization you find in Graphite. Since memory accesses in KVM are instantaneous, execution will always be correct. Synchronization will only be needed to keep devices in sync across cores. The way we solve inter-thread calls is by migrating to the target object's event queue (the interrupt controller's queue for interrupts and the VM's queue for MMIO). Is implemented by releasing the current event queue's lock, taking the target thread's lock, and then updating the thread's current event queue pointer. By releasing the lock of the current queue, we avoid multiple common deadlock scenarios.

It's probably possible to implement something similar to the relaxed synchronization approach I used for KVM in the atomic CPU (especially when using fastmem). The main problem here is probably the decode cache. Doing it with a "proper" memory system on the other hand is likely going to be very challenging due to express snoops.

//Andreas

On 2014-12-04 23:08, Gabe Black via gem5-dev wrote:
How do you set that up? Does it happen automatically? That sounds pretty
handy.

Gabe

On Thu, Dec 4, 2014 at 3:01 PM, Nilay Vaish via gem5-dev <[email protected]>
wrote:

The simulator.  As in different cores of the simulated system are
simulated on different threads of the host system.

--
Nilay


On Thu, 4 Dec 2014, Gabe Black via gem5-dev wrote:

  This is somewhat tangential, but are you saying the simulator is
multithreaded now? Or just your simulation?

Gabe

On Thu, Dec 4, 2014 at 10:03 AM, Andreas Sandberg via gem5-dev <
[email protected]> wrote:

  On 04/12/14 16:10, Nilay Vaish via gem5-dev wrote:
  I have been trying to run ht kvm cpu when using multiple cores.  With
single threaded simulation, the simulation stops making progress if the
simulated system has more than 4 cores.  With multi-threaded
simulation, I
do not see any progress even when two cores are being simulated.  For
the
multi-threaded simulation, I made the following changes as suggested in
the comment for the changeset:   10157:5c2ecad1a3c9.  So, how many cores
have others tested kvm cpu with?  Is there something that I might not be
doing right?


I reported scalability numbers up to 8 cores for one of the Splash 2
benchmarks in my thesis, so 8 cores definitely work. IIRC, I tested it
on 32 cores as well, but I didn't report those numbers.

There are three issues you might be running into:

  * There might be devices (CPU child objects) that don't live in the
right thread.
  * The quantum might be too large (I never managed to get anything more
than 1ms to work).
  * Newly introduced bugs.

The code fragment I used in my old scripts was something like this:

     if not no_kvm and cpus > 1:
         test_sys.eventq_index = 0
         for idx, cpu in enumerate(test_sys.cpu_boot):
             for obj in cpu.descendants():
                 obj.eventq_index = test_sys.eventq_index
             cpu.eventq_index = idx + 1

The fragment above ensures that any descendants of the CPU are assigned
to the device thread and only the CPU lives in a separate thread.

//Andreas


-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy
the
information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No:  2548782


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

  _______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

  _______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to