Andi Kleen wrote:
On Sun, Nov 30, 2008 at 10:07:01PM +0200, Avi Kivity wrote:
Right. Allocated from the guest kernel's perspective. This may be
different from the host kernel's perspective.
Linux will delay touching memory until the last moment, Windows will not
(likely it zeros pages on their own nodes, but who knows)?
The problem on Linux is that the first touch is clear_page() and
that unfortunately happens in the direct mapping before mapping,
so the "detect mapping" trick doesn't quite work (unless it's a 32bit highmem
page).
It should still be on the same cpu.
Ok one could migrate it on mapping. When the data is still cache
hot that shouldn't be that expensive. Thinking about it again
it might be actually a reasonable approach.
Could also work for normal apps - move code and data to local node.
But again, we don't have any guest mapping information when we're
running under #pt; only the first access. If we're willing so sacrifice
memory, we can get the first access per virtual node.
In our case, the application is the guest kernel, which does know.
It knows but it doesn't really care all that much. The only thing
that counts is the end performance in this case.
Well, testing is the only way to know. I'm particularly interested in
how Windows will perform, since we know so little about its internals.
From some light googling, it looks like Windows has a home node for a
thread, and will allocate pages from the home node even when the thread
is executing on some other node temporarily. It also does automatic
page migration in some cases.
The difference is, Linux (as a guest) will try to reuse freed pages from
an application or pagecache, knowing which node they belong to.
I agree that if all you do is HPC style computation (boot a kernel and
one app with one process per cpu), then the heuristics work well.
Or if there's a way to detect unmapping/remapping.
Sure, if you're willing to drop %pt.
It is certainly not perfect and has holes (like any heuristics),
but it has the advantage of being fully dynamic.
It also has the advantage of being already implemented (apart from fake
SRAT tables; and that isn't necessary for HPC apps).
What do you mean?
Which part? being already implemented? Like I said earlier, right now
kvm will allocate memory from the process that runs the vcpu that first
touched this memory. Given that Linux prefers allocating from the
current node, we already implement the first touch heuristic.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html