On Mon, Jun 02, 2014 at 11:01:07PM +0200, Alexander Graf wrote:
>
>
> > Am 02.06.2014 um 22:41 schrieb "Michael S. Tsirkin" <[email protected]>:
> >
> >> On Mon, Jun 02, 2014 at 10:35:56PM +0200, Alexander Graf wrote:
> >>
> >>
> >>>> Am 02.06.2014 um 22:20 schrieb "Michael S. Tsirkin" <[email protected]>:
> >>>>
> >>>> On Mon, Jun 02, 2014 at 09:48:19PM +0200, Alexander Graf wrote:
> >>>>
> >>>>
> >>>>>> Am 02.06.2014 um 21:25 schrieb "Gabriel L. Somlo" <[email protected]>:
> >>>>>>
> >>>>>> On Wed, May 07, 2014 at 04:52:13PM -0400, Gabriel L. Somlo wrote:
> >>>>>> Treat monitor and mwait instructions as nop, which is architecturally
> >>>>>> correct (but inefficient) behavior. We do this to prevent misbehaving
> >>>>>> guests (e.g. OS X <= 10.7) from crashing after they fail to check for
> >>>>>> monitor/mwait availability via cpuid.
> >>>>>>
> >>>>>> Since mwait-based idle loops relying on these nop-emulated instructions
> >>>>>> would keep the host CPU pegged at 100%, do NOT advertise their presence
> >>>>>> via cpuid, to prevent compliant guests from using them inadvertently.
> >>>>>>
> >>>>>> Signed-off-by: Gabriel L. Somlo <[email protected]>
> >>>>>> ---
> >>>>>>
> >>>>>> New in v2: remove invalid_op handler functions which were only used to
> >>>>>> handle exits caused by monitor and mwait
> >>>>>>
> >>>>>>>> On Wed, May 07, 2014 at 08:31:27PM +0200, Alexander Graf wrote:
> >>>>>>>> On 05/07/2014 08:15 PM, Michael S. Tsirkin wrote:
> >>>>>>>> If we really want to be paranoid and worry about guests
> >>>>>>>> that use this strange way to trigger invalid opcode,
> >>>>>>>> we can make it possible for userspace to enable/disable
> >>>>>>>> this hack, and teach qemu to set it.
> >>>>>>>>
> >>>>>>>> That would make it even safer than it was.
> >>>>>>>>
> >>>>>>>> Not sure it's worth it, just a thought.
> >>>>>>>
> >>>>>>> Since we don't trap on non-exposed other instructions (new SSE and
> >>>>>>> whatdoiknow) I don't think it's really bad to just expose
> >>>>>>> MONITOR/MWAIT as nops.
> >>>>>
> >>>>> Would it make sense to make this a module parameter,
> >>>>> (e.g., "int emulate_mwait") ?
> >>>>>
> >>>>> Default would be 0 (no emulation). 1 would mean "emulate as nop", and
> >>>>> if anyone ever figures out how to do proper page-locking based
> >>>>> emulation we could use 2 to enable that, etc. ?
> >>>>>
> >>>>> Not sure we'd want qemu to enable/disable it automatically, though...
> >>>>>
> >>>>> What do you all think ?
> >>>>
> >>>> I don't like module parameters - they're system global and there's a
> >>>> good chance you want to run non-osx in parallel ;).
> >>>>
> >>>> I'd either link this to the cpuid bits or enable it forcefully through
> >>>> ENABLE_CAP per vcpu.
> >>>>
> >>>> Alex
> >>>
> >>> Point is that.
> >>> Paolo here thinks it's safe to just make it a NOP unconditionally.
> >>> so module parameter would be there as a debugging tool:
> >>> as a means for users to test with old kvm behaviour if they see breakage.
> >>> Which we don't expect, so no need to waste cycles creating a pretty
> >>> interface for it.
> >>
> >> Both interfaces already exist, so where's the problem?
> >
> > Hmm sorry which interfaces for enabling mwait nop emulation exist?
>
> User space can force cpuid bits that kvm doesn't return as supported, so we
> do have a negative-by-default switch.
>
> We also have an ENABLE_CAP ioctl. Enabling the monitor/mwait nop ability
> explicitly by that is a 5 line patch.
>
> Either way is very flexible and not system wide.
W.r.t. monitor/mwait, a guest can do one of the following:
1. Never check CPUID, and never use monitor/mwait
- This is great, we don't have to do anything about these
2. Check CPUID for mwait, use it to idle in preference over hlt
- Linux, Windows, and Mavericks (10.9) do this
- we never want to have CPUID say "yes" to these, since
monitor/mwait support will be clunky in the best case,
and hlt is overwhelmingly preferable! [*]
3. Never check CPUID, use monitor/mwait with abandon
- OS X 10.6 .. 10.8 does this
- emulating monitor/mwait here allows us to boot the guest
and use it, and perform sysadmin surgery to force a hlt
based idle
4. Check CPUID, panic if unavailable
- OS X 10.5 did this, IIRC.
- whether I can do kext surgery and get it to stop checking
CPUID *in addition to* falling back to hlt-based idle is
TBD.
- emulating monitor/mwait allows us to boot this type of
guest, BUT WE ALSO HAVE TO ADVERTISE IT VIA CPUID !!!
I like telling qemu on the command line "do monitor = mwait = nop;
for this guest only", and having qemu pass that on to KVM for only the
VCPUs associated with this guest, optionally, for cases 3 and 4 only
(everyone else gets the invalid opcode fault behavior as before).
[*] I think we've been over this a few times already, but here's a
quick recap:
- monitor == mwait == NOP is correct (albeitwasteful) behavior
- mwait MUST expect and deal with spurious wakeups
(per the Intel manual)
- mwait == nop is an INSTANT spurious wakeup (hence
works OK with any correctly written program) !
- monitor == nop won't "arm" anything, but that
doesn't matter if mwait always immediately wakes up !
- this pegs the host CPU to 100%, so MUCH worse than
hlt, shouldn't do it unless we ABSOLUTELY HAVE TO !!!
- guest-mode mwait should NEVER be allowed to stop the host CPU
(and, according to the Intel manual, it's HARD to try and
make it do so, which I think is on purpose !)
- instead, guest-mode mwait should map to a host-side
condition-wait (where a write to a monitor-ed area
acts as condition-signal).
- the most likely way to implement something like that
would be to write-protect pages and handle write faults
- and I never got it working *properly* (but I'm a n00b,
so that ain't saying much :)
- but the granularity would be all wrong compared to any
real CPU (1 page >> typical monitored area size)
- but I still don't see it being any better than
hlt-based idle, even if we *did* get it to work correctly !!!
I'll look into ENABLE_CAP, and how to expose that on the qemu command
line (I think I might need both methods mentioned by Alex in tandem,
but I'll have to study existing examples before I can say anything
useful here). Any extra words of wisdom on how to do that, what
examples might be best to study for inspiration, etc, much appreciated !!!
Thanks,
--Gabriel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html