Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests

Anthony Liguori Mon, 08 Dec 2008 14:01:16 -0800

André Przywara wrote:

I was partly wrong, the code is in BOCHS CVS, but not in qemu. It wasn't
in BOCHS 2.3.7 release, which qemu is currently based on. Could you pull
the latest BIOS code from BOCHS CVS to qemu? This would give us the
firmware interface for free and I could more easily port my patches.

Working on that right now. BOCHS CVS has diverged a fair bit from whatwe have so I'm adjusting our current patches and doing regression testing.

What's actually bothering you with libnuma dependency? I
could directly use the Linux mbind syscall, but I think using a library
is more sane (and probably more portable).

You're making a default policy decision (pin nodes and pin cpus). Yourassuming that Linux will do the wrong thing by default and that thedecision we'll be making is better.

That policy decision requires more validation. We need benchmarksshowing what the perf is like when not pinning vs pinning and we need tounderstand whether the bad performance is a Linux bug that can be fixedor whether it's something fundamental.

What I'm concerned about, is that it'll make the default situationworse. I advocated punting to management tools because that at leastgives the user the ability to make their own decisions which means youdon't have to prove that this is the correct default decision.

I don't care about a libnuma dependency. Library dependencies are fineas long as they're optional.

Almost right, but simply calling qemu-system-x86_64 can lead to badsituations. I lately saw that VCPU #0 was scheduled on one node andVCPU #1 on another. This leads to random (probably excessive) remoteaccesses from the VCPUs, since the guest assumes uniform memory
That seems like Linux is behaving badly, no? Can you describe thesituation more?
That is just my observation. I have to do more research to get a decent
explanation, but I think the problem is that in this early state the
threads barely touch any memory, so Linux tries to distribute them as
best as possible. Just a quick run on a quad node machine with 16 cores
in total:

How does memory migration fit into all of this though? Statisticallyspeaking, if your NUMA guest is behaving well, it should be easy torecognize the groupings and perform the appropriate page migration. Iwould think even the most naive page migration tool would be able to dothe right thing.

NUMA systems are expensive. If a customer cares about performance(as opposed to just getting more memory), then I think tools likenumactl are pretty well known.
Well, expensive depends, especially if I think of your employer ;-) In
fact every AMD dual socket server is NUMA, and Intel will join thegame next year.

But the NUMA characteristics on an AMD system are relatively minor. Idoubt that doing static pinning would be what most users wanted since itcould reduce overall system performance noticably.

Even with more traditional NUMA systems, the cost of remote memoryaccess is often lost by the opportunity cost of leaving a CPU idle.That's what pinning does, it leaves CPUs potentially idle.

Additionally one could use some kind of home node, so one temporarilycould change the VCPUs affinity and later return to the optimalaffinity (where the memory is located) without specifying it again.

Please resubmit with the first three patches in the front. I don'tthink exposing NUMA attributes to a guest is at all controversial sothat's relatively easy to apply.

I'm not saying that the last patch can't be applied, but I don't thinkit's as obvious that it's going to be a win when you start doingperformance tests.


Regards,

Anthony Liguori

Comments are welcome.

Regards,
Andre.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests

Reply via email to