On 13.11.2008, at 05:35, Amit Shah wrote:

* On Wednesday 12 Nov 2008 22:49:16 Alexander Graf wrote:
On 12.11.2008, at 17:52, Amit Shah wrote:
Hi Alex,

* On Wednesday 12 Nov 2008 21:09:43 Alexander Graf wrote:
Hi,

I was thinking a bit about cross vendor migration recently and since
we're doing open source development, I figured it might be a good
idea
to talk to everyone about this.

So why are we having a problem?

In normal operation we don't. If we're running a 32-bit kernel, we
can
use SYSENTER to jump from kernel<->userspace. If we're on a 64-bit
kernel with 64-bit userspace, every CPU supports SYSCALL. At least
Linux is being smart on this and does use exactly these two
capabilities in these two cases.
But if we're running in compat mode (64-bit kernel with 32-bit
userspace), things differ. Intel supports only SYSENTER here, while
AMD only supports SYSCALL. Both can still use int80.

Operating systems detect usage of SYSCALL or SYSENTER pretty early on
(Linux does this on vdso). So when we boot up on an Intel machine,
Linux assumes that using SYSENTER in compat mode is fine. Migrating
that machine to an AMD machine breaks this assumption though, since
SYSENTER can't be used in compat mode.
On LInux, this detection is based on the CPU vendor string. If Linux
finds a "GenuineIntel", SYSENTER is used in compat mode, if it's
"AuthenticAMD", SYSCALL is used and if none of these two is found,
int80 is used.

I tried modifying the vendor string, removed the "overwrite the
vendor
string with the native string" hack and things look like they work
just fine with Linux.

Unfortunately right now I don't have a 64-bit Windows installation
around to check if that approach works there too, but if it does and
no known OS breaks due to the invalid vendor string, we can just
create our own virtual CPU string, no?

qemu has an option for that, -cpu qemu64 IIRC. As long as we expose
practically correct cpuids and MSRs, this should be fine. I've not
tested
qemu64 with winxp x64 though. Also, last I knew, winxp x64
installation
didn't succeed with --no-kvm. qemu by default exposes an AMD CPU type.

I wasn't talking about CPUID features, but the vendor string. Qemu64
provides the AuthenticAMD string, so we don't run into any issues I'm
presuming.

Right -- the thing is, with the default AuthenticAMD string, winp x64
installation fails. That has to be because of some missing cpuids. That's one of the drawbacks of exposing a well-known CPU type. I was suggesting we should try out the -cpu qemu64 CPU type since it exposes a non- standard CPU to see if guests and most userspace programs work fine without any further
tweaking -- see the 'cons' below for why this might be a problem.

I still don't really understand what you're trying to say - qemu64 is the default in KVM right now. You mean winxp64 installation doesn't work as is and we should fix it? This has nothing to do with the migration problems, right?



There are pros and cons to expose a custom vendor ID:

pros:
- We don't need to have all the cpuid features exposed which are
expected of a
physically available CPU in the market, for example, badly-coded
applications
might crash if we don't have SSSE3 on a Core2Duo. But badly-coded or
not, not
exposing what's actually available on every C2D out there is bad.

cons:
- To expose the "correct" set of feature bits for a known processor,
we also
need to check the family/model/stepping to support the exact same
feature
bits that were present in the CPU.
- We might not get some optimizations that OSes might have based on
CPU type,
even if the host CPU qualifies for such optimizations
- Standard programs like benchmarking tools, etc., might fail if
they depend
on the vendor string for their functionality

For 32-bit guests, I think exposing a pentium4 or Athlon CPU type
should be
fine. For 64-bit guests, the newer the better.

Well, we could create different CPU definitions:

- migration safe (do what is safe for migration)

There are multiple ways of approaching this: peg to a least-known good CPU type, all of whose instructions will work on processors from both the major vendors. However, you never know how the server pools change and you'd want to upgrade the CPU type once you know the CPUs that are installed in servers. This has to be dynamic and the management application has to take care of exposing a CPU that's of a "safe" type for the particular server pool. We have to provide ways to mask off CPUID bits as requested by the management application. (Each server sends its cpuid to the management application, which calculates the safest bits and then conveys this to each server before
starting a VM.)

IMHO we shouldn't really start to be smart here. There's only so much benefit in using the least common dominator between all CPUs in the datacenter vs. using the least common dominator between all possible CPUs. You'll basically end up enabling some newer SSE instructions.

So I don't think we need to go through the hassle of making this dynamic. If you want to migrate your machines - use the migrate preset. That won't give you the 150% speed boost on video encoding, but should not really be any slower on normal workloads. It does make things a lot more transparent to us and the admin of a network though, because you know what you'll end up with "-cpu migration".

- CPU specific (like a Core2Duo, necessary to run Mac OS X)

This doesn't need any more work -- we already have the ability to select CPU types. If the management application has knowledge of the kind of OS being installed in a VM (which these days is true), exposing a Core2Duo for a
Mac-based OS isn't difficult.

There is no sysenter emulation for IA-32e on AMD yet, right? That's the only issue I see here and your emulation patch should address that.

- host (fastest possible, but no migration)

This should be the default.

I'm not sure. Either host or migration should be the default. This actually depends on the workload you have on KVM. For servers you'll probably want to have migration be the default. For desktop usage it's host. I can't think of a way we can be smart about that on the KVM level.



I don't think we could find one definition that fits all, so the user
would have to define what the usage pattern will be.

I'd love to hear comments and suggestions on this and hope we'll end up in a fruitful discussion on how to improve the current situation.

I have a patch ready for emulating sysenter/sysexit on AMD systems
(needs
testing). Patching the guest was an option that was discouraged; I
had a hack
ready but it was quickly shelved (again, untested).

That sounds useful for misbehaving guests or cases I haven't thought
of yet. Are you sure you're intercepting the SYSENTER MSRs on AMD, so
you don't end up only getting 32 bits?

Can you elaborate?

When you write to MSR_IA32_SYSENTER_EIP on AMD, that MSR will be directly passed through to the hardware (search for that MSR in svm.c). This is because SVM automatically writes the SYSENTER MSRs to the SYSENTER fields in the VMCB.

Now, AMD only implements 32 bits here (actually Intel simply extended it to 64 bit without coordinating with AMD). So if you use SYSENTER on an AMD VM now, only 32 bits of that IP will be stored in the actual hardware's MSR, thus in the VMCB, which means that you won't be able to boot 64-bit Linux system with "GenuineIntel" vendor specification on SVM.

I just wanted to hint you on that, so you can implement it in the emulation that you'll hopefully post to the list even though it's preliminary, so everyone can contribute to and comment on it.

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to