Avi Kivity wrote:
Anthony Liguori wrote:

I see no compelling reason to do cpu placement internally. It can be done quite effectively externally.

Memory allocation is tough, but I don't think it's out of reach. Looking at the numactl man page, you can do:

numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch
      Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1.


Since we can already create VM's with the -mem-path argument, if you create a 2GB guest and want it to span two numa nodes, you could do:

numactl  --offset=0G  --length=1G --membind=0 --file /dev/shm/A --touch
numactl  --offset=1G  --length=1G --membind=1 --file /dev/shm/A --touch

And then create the VM with:

qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ...

What's best about this approach, is that you get full access to what numactl is capable of. Interleaving, rebalancing, etc.

It looks horribly difficult and unintuitive. It forces you to use -mem-path (which is an abomination; the only reason it lives is that we can't allocate large pages with it).

As opposed to inventing new options for QEMU that convey all of the same information a slightly different way? We're stuck with -mem-path so we might as well make good use of it.

The proposed syntax is:

qemu -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3

The new syntax would be:

qemu -smp 4 -numa nodes=2,cpus=1:2:3:4,mem=1G:1G -mem-path /dev/hugetlbfs/foo

Then you would have to look up the thread ids, and do

taskset <vcpu1>
taskset <vcpu2>
taskset <vcpu3>
taskset <vcpu4>
numactl -o 1G -l 1G -m 0 -f /dev/hugetlbfs/foo
numactl -o 1G -l 1G -m 1 -f /dev/hugetlbfs/foo

This may look like a lot more, but it's not going to be nearly enough to specify a NUMA placement on startup. What if you have a very large NUMA system and want to rebalance virtual machines? You need a mechanism to do this that now has to be exposed through the monitor. In fact, you'll almost certainly introduce a taskset-like monitor command and a numactl-like monitor command.

Why reinvent the wheel? Plus, taskset and numactl gives you a lot of flexibility. All we're going to do by cooking this stuff into QEMU is artificially limit ourselves.

Regards,

Anthony LIguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to