Alexander Graf wrote:
On 24.06.2010, at 00:21, Anthony Liguori wrote:

On 06/23/2010 04:09 PM, Andre Przywara wrote:
Hi,

these three patches add basic NUMA pinning to KVM. According to a user
provided assignment parts of the guest's memory will be bound to different
host nodes. This should increase performance in large virtual machines
and on loaded hosts.
These patches are quite basic (but work) and I send them as RFC to get
some feedback before implementing stuff in vain.

To use it you need to provide a guest NUMA configuration, this could be
as simple as "-numa node -numa node" to give two nodes in the guest. Then
you pin these nodes on a separate command line option to different host
nodes: "-numa pin,nodeid=0,host=0 -numa pin,nodeid=1,host=2"
This separation of host and guest config sounds a bit complicated, but
was demanded last time I submitted a similar version.
I refrained from binding the vCPUs to physical CPUs for now, but this
can be added later with an "cpubind" option to "-numa pin,". Also this
could be done from a management application by using sched_setaffinity().

Please note that this is currently made for qemu-kvm, although I am not
up-to-date regarding the curent status of upstreams QEMU's true SMP
capabilities. The final patch will be made against upstream QEMU anyway.
Also this is currently for Linux hosts (any other KVM hosts alive?) and
for PC guests only. I think both can be fixed easily if someone requests
it (and gives me a pointer to further information).

Please comment on the approach in general and the implementation.
If we extended integrated -mem-path with -numa such that a different path could 
be used with each numa node (and we let an explicit file be specified instead 
of just a directory), then if I understand correctly, we could use numactl 
without any specific integration in qemu.  Does this sound correct?

IOW:

qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem -numa 
node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem

It's then possible to say:

numactl --file /dev/shm/node0.mem --interleave=0,1
numactl --file /dev/shm/node1.mem --membind=2

I think this approach is nicer because it gives the user a lot more flexibility 
without having us chase other tools like numactl.  For instance, your patches 
only support pinning and not interleaving.

Interesting idea.

So who would create the /dev/shm/nodeXX files?
Currently it is QEMU. It creates a somewhat unique filename, opens and unlinks it. The difference would be to name the file after the option and to not unlink it.

> I can imagine starting numactl before qemu, even though that's
> cumbersome. I don't think it's feasible to start numactl after
> qemu is running. That'd involve way too much magic that I'd prefer
> qemu to call numactl itself.
Using the current code the files would not exist before QEMU allocated RAM, and after that it could already touch pages before numactl set the policy. To avoid this I'd like to see the pinning done from within QEMU. I am not sure whether calling numactl via system() and friends is OK, I'd prefer to run the syscalls directly (like in patch 3/3) and pull the necessary options into the -numa pin,... command line. We could mimic numactl's syntax here.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to