Re: [gem5-dev] KVM CPU when using multiple cores

Andreas Sandberg via gem5-dev Thu, 04 Dec 2014 15:54:12 -0800

PDES in gem5 is implemented using a fairly standard quantum-based PDESapproach (similar to WWT) where a barrier across all threads is enforcedevery N cycles. Objects (in practice, sub-trees of the object graph) canbe assigned to their own event queues. Cross-queue scheduling of eventsis supported as long as the new events are scheduled at least N ticksinto the future to ensure determinism.

To enable the PDES functionality in gem5, you need to do two thingsfrom your configuration script: 1) Per-object queue assignment (theeventq_index property) and 2) set the simulation quantum on the rootobject (the sim_quantum property). By default, all objects are assignedto event queue 0 and new queues are created as soon as the simulatorencounters an object with a new queue ID.

You should keep in mind that very few objects support communicationacross parallel event queues. I have only used it to simulate multiplecores in parallel in KVM, but I think Steve has used it to simulate twoparallel systems communicating over Ethernet. Since gem5 has a tendencyto make unsynchronized cross-object method calls, you generally can'thave devices in the same system in multiple different threads.

KVM abuses barrier synchronization in gem5 slightly, making it behavemore like the kind of relaxed synchronization you find in Graphite.Since memory accesses in KVM are instantaneous, execution will always becorrect. Synchronization will only be needed to keep devices in syncacross cores. The way we solve inter-thread calls is by migrating to thetarget object's event queue (the interrupt controller's queue forinterrupts and the VM's queue for MMIO). Is implemented by releasing thecurrent event queue's lock, taking the target thread's lock, and thenupdating the thread's current event queue pointer. By releasing the lockof the current queue, we avoid multiple common deadlock scenarios.

It's probably possible to implement something similar to the relaxedsynchronization approach I used for KVM in the atomic CPU (especiallywhen using fastmem). The main problem here is probably the decode cache.Doing it with a "proper" memory system on the other hand is likely goingto be very challenging due to express snoops.


//Andreas

On 2014-12-04 23:08, Gabe Black via gem5-dev wrote:

How do you set that up? Does it happen automatically? That sounds pretty
handy.

Gabe

On Thu, Dec 4, 2014 at 3:01 PM, Nilay Vaish via gem5-dev <[email protected]>
wrote:

The simulator.  As in different cores of the simulated system are
simulated on different threads of the host system.

--
Nilay


On Thu, 4 Dec 2014, Gabe Black via gem5-dev wrote:

  This is somewhat tangential, but are you saying the simulator is

multithreaded now? Or just your simulation?

Gabe

On Thu, Dec 4, 2014 at 10:03 AM, Andreas Sandberg via gem5-dev <
[email protected]> wrote:

  On 04/12/14 16:10, Nilay Vaish via gem5-dev wrote:

  I have been trying to run ht kvm cpu when using multiple cores.  With

single threaded simulation, the simulation stops making progress if the
simulated system has more than 4 cores.  With multi-threaded
simulation, I
do not see any progress even when two cores are being simulated.  For
the
multi-threaded simulation, I made the following changes as suggested in
the comment for the changeset:   10157:5c2ecad1a3c9.  So, how many cores
have others tested kvm cpu with?  Is there something that I might not be
doing right?

I reported scalability numbers up to 8 cores for one of the Splash 2
benchmarks in my thesis, so 8 cores definitely work. IIRC, I tested it
on 32 cores as well, but I didn't report those numbers.

There are three issues you might be running into:

  * There might be devices (CPU child objects) that don't live in the
right thread.
  * The quantum might be too large (I never managed to get anything more
than 1ms to work).
  * Newly introduced bugs.

The code fragment I used in my old scripts was something like this:

     if not no_kvm and cpus > 1:
         test_sys.eventq_index = 0
         for idx, cpu in enumerate(test_sys.cpu_boot):
             for obj in cpu.descendants():
                 obj.eventq_index = test_sys.eventq_index
             cpu.eventq_index = idx + 1

The fragment above ensures that any descendants of the CPU are assigned
to the device thread and only the CPU lives in a separate thread.

//Andreas


-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy
the
information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No:  2548782


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

  _______________________________________________

gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

  _______________________________________________

gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] KVM CPU when using multiple cores

Reply via email to