Re: [PATCH 3/5] booke: define reset and shutdown hcalls

Gleb Natapov Wed, 17 Jul 2013 04:00:58 -0700

On Tue, Jul 16, 2013 at 06:04:34PM -0500, Scott Wood wrote:
> On 07/16/2013 01:35:55 AM, Gleb Natapov wrote:
> >On Mon, Jul 15, 2013 at 01:17:33PM -0500, Scott Wood wrote:
> >> On 07/15/2013 06:30:20 AM, Gleb Natapov wrote:
> >> >There is no much sense to share hypercalls between architectures.
> >> >There
> >> >is zero probability x86 will implement those for instance
> >>
> >> This is similar to the question of whether to keep device API
> >> enumerations per-architecture...  It costs very little to keep it in
> >> a common place, and it's hard to go back in the other direction if
> >> we later realize there are things that should be shared.
> >>
> >This is different from device API since with device API all arches
> >have
> >to create/destroy devices, so it make sense to put device lifecycle
> >management into the common code, and device API has single entry point
> >to the code - device fd ioctl - where it makes sense to handle common
> >tasks, if any, and despatch others to specific device implementation.
> >
> >This is totally unlike hypercalls which are, by definition, very
> >architecture specific (the way they are triggered, the way parameter
> >are passed from guest to host, what hypercalls arch needs...).
> 
> The ABI is architecture specific.  The API doesn't need to be, any
> more than it does with syscalls (I consider the
> architecture-specific definition of syscall numbers and similar
> constants in Linux to be unfortunate, especially for tools such as
> strace or QEMU's linux-user emulation).
> 
Unlike syscalls different arches have very different ideas what
hypercalls they need to implement, so while with unified syscall space I
can see how it may benefit (very) small number of tools, I do not see
what advantage it will give us. The disadvantage is one more global name
space to manage.


> >> Keeping it in a common place also makes it more visible to people
> >> looking to add new hcalls, which could cut down on reinventing the
> >> wheel.
> >I do not want other arches to start using hypercalls in the way
> >powerpc
> >started to use them: separate device io space, so it is better to hide
> >this as far away from common code as possible :) But on a more serious
> >note hypercalls should be a last resort and added only when no other
> >possibility exists, so people should not look what hcalls others
> >implemented, so they can add them to their favorite arch, but they
> >should have a problem at hand that they cannot solve without
> >hcall, but
> >at this point they will have pretty good idea what this hcall
> >should do.
> 
> Why are hcalls such a bad thing?
> 
Because they often used to do non architectural things making OSes
behave different from how they runs on real HW and real HW is what
OSes are designed and tested for. Example: there once was a KVM (XEN
have/had similar one) hypercall to accelerate MMU operation.  One thing it
allowed is to to flush tlb without doing IPI if vcpu is not running. Later
optimization was added to Linux MMU code that _relies_ on those IPIs for
synchronisation. Good that at that point those hypercalls were already
deprecated on KVM (IIRC XEN was broke for some time in that regard). Which
brings me to another point: they often get obsoleted by code improvement
and HW advancement (happened to aforementioned MMU hypercalls), but they
hard to deprecate if hypervisor supports live migration, without live
migration it is less of a problem. Next point is that people often try
to use them instead of emulate PV or real device just because they
think it is easier, but it is often not so. Example: pvpanic device was
initially proposed as hypercall, so lets say we would implement it as
such. It would have been KVM specific, implementation would touch core
guest KVM code and would have been Linux guest specific. Instead it was
implemented as platform device with very small platform driver confined
in drivers/ directory, immediately usable by XEN and QEMU tcg in addition
to KVM, will likely gain Windows driver. No downsides, only upsides.

So given all that hypercalls are considered more of a necessary evil in
KVM land :)

> Should new Linux syscalls be avoided too, in favor of new emulated
> devices exposed via vfio? :-)
Try to add new syscall to Linux and see how simple it is.

> 
> >> >(not sure why PPC will want them either instead of emulating
> >> >devices that do
> >> >shutdown/reset).
> >>
> >> Besides what Alex said, for shutdown we don't have any existing
> >> device to emulate (our real hardware just doesn't have that
> >> functionality).  For reset we currently do emulate, but it's awkward
> >> to describe in the device tree what we actually emulate since the
> >> reset functionality is part of a kitchen-sink "device" of which we
> >> emulate virtually nothing other than the reset.  Currently we
> >> advertise the entire thing and just ignore the rest, but that causes
> >> problems with the guest seeing the node and trying to use that
> >> functionality.
> >>
> >What about writing virtio device for shutdown
> 
> That sounds like quite a bit more work than hcalls.  It also ties up
> a virtual PCI slot -- some machines don't have very many (mpc8544ds
> has 2, though we could and should expand that in the paravirt e500
> machine).
Yes, virtio device may be more work, but it should not be complex
or high performance device, having only one outstanding command will
be OK.  The 2 slots limit is to harsh indeed, but since hcall implies PV
the device may be available only on paravirt. And device functionality
can be expandable, so you will not need to write another one and take
another slot for each little thing you want to add. It can advertise
capability in one bar and takes command/return values through virtio ring.

--
                        Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] booke: define reset and shutdown hcalls

Reply via email to