Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests

Anthony Liguori Fri, 05 Dec 2008 06:28:18 -0800

Hi Andre,

This patch series needs to be posted to qemu-devel. I know qemu doesn'tdo true SMP yet, but it will in the relatively near future. Either way,some of the design points needs review from a larger audience thanpresent on kvm-devel.

I'm not a big fan of the libnuma dependency. I'll willing to concedethis if there's a wide agreement that we should support this directly inQEMU.

I don't think there's such a thing as a casual NUMA user. The defaultNUMA policy in Linux is node-local memory. As long as a VM is smallerthan a single node, everything will work out fine.

In the event that the VM is larger than a single node, if a user iscreating it via qemu-system-x86_64, they're going to either not care atall about NUMA, or be familiar enough with the numactl tools thatthey'll probably just want to use that. Once you've got your headaround the fact that VCPUs are just threads and the memory is just ashared memory segment, any knowledgable sysadmin will have no problemdoing whatever sort of NUMA layout they want.

The other case is where management tools are creating VMs. In thiscase, it's probably better to use numactl as an external tool becausethen it keeps things consistent wrt CPU pinning.

There's also a good argument for not introducing CPU pinning directly toQEMU. There are multiple ways to effectively do CPU pinning. You canuse taskset, you can use cpusets or even something like libcgroup.

If you refactor the series so that the libnuma patch is the very lastone and submit to qemu-devel, I'll review and apply all of the firstpatches. We can continue to discuss the last patch independently of thefirst three if needed.


Regards,

Anthony Liguori

Andre Przywara wrote:

Hi,
this patch series introduces multiple NUMA nodes support within KVMguests.
This is the second try incorporating several requests from the list:
- use the QEMU firmware configuration interface instead of CMOS-RAM
- detect presence of libnuma automatically, can be disabled with
  ./configure --disable-numa
This only applies to the host side, the command line and guest (BIOS)
side are always built and functional, although this configuration
is only useful for research and debugging
- use a more flexible command line interface allowing:
  - specifying the distribution of memory across the guest nodes:
    mem:1536M;512M
  - specifying the distribution of the CPUs:
    cpu:0-2;3
  - specifying the host nodes the guest nodes should be pinned to:
    pin:3;2
All of these options are optional, in case of mem and cpu theresources are split equally across all guest nodes if omitted. Pleasenote that at least in Linux SRAT takes precedence over E820, so thetotal usable memory will be the sum specified at the mem: option(although QEMU will still allocate the amount at -m).If pin: is omitted, the guest nodes will be pinned to those host nodeswhere the threads are happen to be scheduled at on start-up time. Thisrequires the (v)getcpu (v)syscall to be usable, this is true forkernels up from 2.6.19 and glibc >= 2.6 (sched_getcpu()). I have ahack if glibc doesn't support this, tell me if you are interested.The only non-optional argument is the number of guest nodes, apossible command line looks like:
-numa 3,mem:1024M;512M;512M,cpu:0-1;2;3
Please note that you have to quote the semicolons on the shell.

The monitor command is left out for now and will be send later.

Please apply.

Regards,
Andre.

Signed-off-by: Andre Przywara <[EMAIL PROTECTED]>


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests

Reply via email to