Avi Kivity wrote:
Anthony Liguori wrote:
I see no compelling reason to do cpu placement internally. It can be
done quite effectively externally.
Memory allocation is tough, but I don't think it's out of reach.
Looking at the numactl man page, you can do:
numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch
Bind the second gigabyte in the tmpfs file /dev/shm/A to node 1.
Since we can already create VM's with the -mem-path argument, if you
create a 2GB guest and want it to span two numa nodes, you could do:
numactl --offset=0G --length=1G --membind=0 --file /dev/shm/A --touch
numactl --offset=1G --length=1G --membind=1 --file /dev/shm/A --touch
And then create the VM with:
qemu-system-x86_64 -mem-path /dev/shm/A -mem 2G ...
What's best about this approach, is that you get full access to what
numactl is capable of. Interleaving, rebalancing, etc.
It looks horribly difficult and unintuitive. It forces you to use
-mem-path (which is an abomination; the only reason it lives is that
we can't allocate large pages with it).
As opposed to inventing new options for QEMU that convey all of the same
information a slightly different way? We're stuck with -mem-path so we
might as well make good use of it.
The proposed syntax is:
qemu -numanode node=1,cpu=2,cpu=3,start=1G,size=1G,hostnode=3
The new syntax would be:
qemu -smp 4 -numa nodes=2,cpus=1:2:3:4,mem=1G:1G -mem-path
/dev/hugetlbfs/foo
Then you would have to look up the thread ids, and do
taskset <vcpu1>
taskset <vcpu2>
taskset <vcpu3>
taskset <vcpu4>
numactl -o 1G -l 1G -m 0 -f /dev/hugetlbfs/foo
numactl -o 1G -l 1G -m 1 -f /dev/hugetlbfs/foo
This may look like a lot more, but it's not going to be nearly enough to
specify a NUMA placement on startup. What if you have a very large NUMA
system and want to rebalance virtual machines? You need a mechanism to
do this that now has to be exposed through the monitor. In fact, you'll
almost certainly introduce a taskset-like monitor command and a
numactl-like monitor command.
Why reinvent the wheel? Plus, taskset and numactl gives you a lot of
flexibility. All we're going to do by cooking this stuff into QEMU is
artificially limit ourselves.
Regards,
Anthony LIguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html