On 2020-03-22 14:20, Lev Aronsky wrote:
On Sun, Mar 22, 2020 at 11:34:35AM +0000, Marc Zyngier wrote:
Hi Lev,

Thanks for this.

On 2020-03-22 09:36, aron...@gmail.com wrote:
> From: Lev Aronsky <aron...@gmail.com>
>
> As per ARM DDI 0487E.a, section D12.3.2, there can be various
> system registers that are IMPLEMENTATION DEFINED.
>
> So far, KVM would inject an undefined instruction into the guest,
> whenever access to an implementation defined system register (from
> here on referred to as IDSR) was trapped. This makes sense, since a
> general-purpose OS probably shouldn't rely on the existence of IDSRs.
>
> This patch introduces an option to give userspace a chance to handle
> these traps. This can be used to emulate specific architectures, and
> virtualize operating systems that rely on the existence of specific
> IDSRs.

Do you have an example of such operating systems? Also, do you have
an example where userspace could actually do something useful about
such access, other than maybe treating it as RAZ/WI?

I ran into the issue when working on our company's project, which aims
to emulate Apple's iOS under QEMU. While emulation currently works
nicely, we were looking to improve performance by using hardware
virtualization, and that's when we ran into the issue of
implementation-defined system registers, since iOS uses those. Frankly,
in our case, we don't really know the purpose of those registers, as far
as the iOS kernel is concerned. When emulating them, we treat them as
simple 64-bit storage areas. It's possible that treating them as RAZ/WI
would work, as well.

It's not really reassuring, is it? :-/

> Similarly to the recently introduced NISV exits, this is an ABI to
> userspace, and comes with a matching new capability that allows the
> configuration of the behavior.

These are different issues: one is a shortcoming of the architecture,
the other a shortcoming (or hyperspecialization) of the guest OS.

You're correct. I mentioned the NISV exits because my code is modeled
after them, and was hoping it would make following my changes easier.

It does. IT is just that wou did seem to conflate the two things, which
triggered an allergic reaction... ;-)

> This patch introduces KVM_CAP_ARM_IDSR_TO_USER, which can be used to
> enable userspace exits due to access to IDSRs. Additionally, it
> introduces a matching new exit reason, KVM_EXIT_ARM_IDSR, to report
> those events to userspace. Userspace can choose to emulate the access
> based on the architecture requirements, or refuse to emulate it and

s/architecture/implementation/

Sorry for the newbie question - but how do I change the description now?
I will upload an updated patch, based on your following comments, but
I'm not sure how to change the original commit description the correct
way.

git commit --amend


> let the kernel continue with the default behavior (injecting an
> undefined instruction exception into the guest).
>
> Signed-off-by: Lev Aronsky <aron...@gmail.com>
> ---
>  arch/arm64/include/asm/kvm_coproc.h |  1 +
>  arch/arm64/include/asm/kvm_host.h   |  6 +++
>  arch/arm64/kvm/sys_regs.c           | 66 ++++++++++++++++++++++++++++-
>  include/uapi/linux/kvm.h            | 14 ++++++
>  tools/include/uapi/linux/kvm.h      | 14 ++++++
>  virt/kvm/arm/arm.c                  | 11 +++++
>  6 files changed, 111 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_coproc.h
> b/arch/arm64/include/asm/kvm_coproc.h
> index 0185ee8b8b5e..34b45efffe52 100644
> --- a/arch/arm64/include/asm/kvm_coproc.h
> +++ b/arch/arm64/include/asm/kvm_coproc.h
> @@ -33,6 +33,7 @@ int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct
> kvm_run *run);
>  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +int kvm_handle_idsr_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
>
>  #define kvm_coproc_table_init kvm_sys_reg_table_init
>  void kvm_sys_reg_table_init(void);
> diff --git a/arch/arm64/include/asm/kvm_host.h
> b/arch/arm64/include/asm/kvm_host.h
> index d87aa609d2b6..951c7e6fec8b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -91,6 +91,12 @@ struct kvm_arch {
>     * supported.
>     */
>    bool return_nisv_io_abort_to_user;
> +
> +  /*
> +   * If we encounter an access to an implementation-defined system
> +   * register, report this to user space.
> +   */
> +  bool return_idsr_to_user;

If we are going to add flags like this, maybe we'd be better off having
actual flags as an unsigned long.

I don't mind either way - as I mentioned, I tried to follow the NISV
exits style, and therefore used `bool`. Should I keep it, change it to
`unsigned long`, or change both mine and the NISV flag to `unsigned
long`?

Change return_nisv_io_abort_to_user to "unsigned long flags", add a new
RETURN_NISV_IO_ABORT_TO_USER flag and repaint all the current users
in a preliminary patch.


>  };
>
>  #define KVM_NR_MEM_OBJS     40
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 3e909b117f0c..0c408546ed7c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2233,6 +2233,16 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>                            NULL, 0);
>  }
>
> +static int handle_imp_def_sys_reg(struct kvm_vcpu *vcpu,
> +                            struct sys_reg_params *params)
> +{
> +  if (vcpu->kvm->arch.return_idsr_to_user)
> +          return 0;
> +
> +  kvm_inject_undefined(vcpu);
> +  return 1;
> +}
> +
>  static bool is_imp_def_sys_reg(struct sys_reg_params *params)
>  {
>    // See ARM DDI 0487E.a, section D12.3.2
> @@ -2255,7 +2265,7 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>    if (likely(r)) {
>            perform_access(vcpu, params, r);
>    } else if (is_imp_def_sys_reg(params)) {
> -          kvm_inject_undefined(vcpu);
> +          return handle_imp_def_sys_reg(vcpu, params);
>    } else {
>            print_sys_reg_msg(params,
>                              "Unsupported guest sys_reg access at: %lx 
[%08lx]\n",
> @@ -2309,9 +2319,63 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>
>    if (!params.is_write)
>            vcpu_set_reg(vcpu, Rt, params.regval);
> +
> +  if (unlikely(!ret)) {
> +          run->arm_idsr.Op0 = params.Op0;
> +          run->arm_idsr.Op1 = params.Op1;
> +          run->arm_idsr.CRn = params.CRn;
> +          run->arm_idsr.CRm = params.CRm;
> +          run->arm_idsr.Op2 = params.Op2;
> +          run->arm_idsr.regval = params.regval;
> +          run->arm_idsr.is_write = params.is_write;
> +          run->arm_idsr.is_needed = 1;
> +          run->arm_idsr.is_handled = 0;

This smells of a state machine. Please document the various states.

This follows the states from MMIO. However, in MMIO, the state is split
between the kvm_run structure and the kvm_vcpu structure (mmio_needed
field). But MMIO is much more widely used, compared to this new feature.
I didn't feel like polluting the kvm_vcpu structure with minor flags,
and chose to put the is_needed flag inside kvm_run.

is_handled is my own addition. Where would be a good place to document
this? Here, or in the header file?

The header file would be the right place indeed.


> +
> +          run->exit_reason = KVM_EXIT_ARM_IDSR;
> +  }
> +
>    return ret;
>  }
>
> +/**
> + * kvm_handle_idsr_return -- Handle syncing register state after user
> + *                           user space emulation
> + *
> + * @vcpu: The VCPU pointer
> + * @run:  The VCPU run struct containing the system register data
> + */
> +int kvm_handle_idsr_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +  int Rt = kvm_vcpu_sys_get_rt(vcpu);
> +
> +  /* Detect an already handled IDSR return */
> +  if (unlikely(!run->arm_idsr.is_needed))
> +          return 0;
> +
> +  run->arm_idsr.is_needed = 0;
> +
> +  if (!run->arm_idsr.is_handled) {
> +          /*
> +           * User space couldn't handle this register - revert to
> +           * default behavior, which is injecting an undefined
> +           * instruction exception into the guest.
> +           */
> +          kvm_inject_undefined(vcpu);
> +          return 0;
> +  }
> +
> +  if (!run->arm_idsr.is_write)
> +          vcpu_set_reg(vcpu, Rt, run->arm_idsr.regval);
> +
> +  /*
> +   * The access to the implementation-defined system register is
> emulated
> +   * and should not be re-executed in the guest.
> +   */
> +  kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> +
> +  return 0;
> +}
> +
> 
/******************************************************************************
>   * Userspace API
> *****************************************************************************/
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4b95f9a31a2f..520aa8c1731e 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -236,6 +236,7 @@ struct kvm_hyperv_exit {
>  #define KVM_EXIT_IOAPIC_EOI       26
>  #define KVM_EXIT_HYPERV           27
>  #define KVM_EXIT_ARM_NISV         28
> +#define KVM_EXIT_ARM_IDSR         29
>
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -400,6 +401,18 @@ struct kvm_run {
>                    __u64 esr_iss;
>                    __u64 fault_ipa;
>            } arm_nisv;
> +          /* KVM_EXIT_ARM_IDSR */
> +          struct /* based on sys_reg_params */ {

Well, not quite sys_reg_params. You have different fields here.

True. Should I drop the mention of sys_reg_params altogether, then?

Yes, please.


> +                  __u8 Op0;
> +                  __u8 Op1;
> +                  __u8 CRn;
> +                  __u8 CRm;
> +                  __u8 Op2;
> +                  __u64 regval;
> +                  __u8 is_write;
> +                  __u8 is_needed; /* Set by kvm on exit */
> +                  __u8 is_handled; /* Set by user space on entry */

Please document the valid values.

Will do.

> +          } arm_idsr;
>            /* Fix the size of the union. */
>            char padding[256];
>    };
> @@ -1010,6 +1023,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ARM_NISV_TO_USER 177
>  #define KVM_CAP_ARM_INJECT_EXT_DABT 178
>  #define KVM_CAP_S390_VCPU_RESETS 179
> +#define KVM_CAP_ARM_IDSR_TO_USER 180
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> diff --git a/tools/include/uapi/linux/kvm.h
> b/tools/include/uapi/linux/kvm.h
> index f0a16b4adbbd..d9830ca92e35 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h

In general, we leave the tools maintainer to update their copy of the UAPI
independently.

I see. I will revert the changes under uapi.

Well, under tools/include/uapi. You really want to keep the include/uapi
ones.


> @@ -236,6 +236,7 @@ struct kvm_hyperv_exit {
>  #define KVM_EXIT_IOAPIC_EOI       26
>  #define KVM_EXIT_HYPERV           27
>  #define KVM_EXIT_ARM_NISV         28
> +#define KVM_EXIT_ARM_IDSR         29
>
>  /* For KVM_EXIT_INTERNAL_ERROR */
>  /* Emulate instruction failed. */
> @@ -400,6 +401,18 @@ struct kvm_run {
>                    __u64 esr_iss;
>                    __u64 fault_ipa;
>            } arm_nisv;
> +          /* KVM_EXIT_ARM_IDSR */
> +          struct /* sys_reg_params */ {
> +                  __u8 Op0;
> +                  __u8 Op1;
> +                  __u8 CRn;
> +                  __u8 CRm;
> +                  __u8 Op2;
> +                  __u64 regval;
> +                  __u8 is_write;
> +                  __u8 is_aarch32;
> +                  __u8 is_32bit;  /* Only valid if is_aarch32 is true */
> +          } arm_idsr;

Funnily enough, the above doesn't match the include/uapi one!

>            /* Fix the size of the union. */
>            char padding[256];
>    };
> @@ -1009,6 +1022,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
>  #define KVM_CAP_ARM_NISV_TO_USER 177
>  #define KVM_CAP_ARM_INJECT_EXT_DABT 178
> +#define KVM_CAP_ARM_IDSR_TO_USER 180
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index d65a0faa46d8..3e5eb0fadb0c 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -87,6 +87,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>            r = 0;
>            kvm->arch.return_nisv_io_abort_to_user = true;
>            break;
> +  case KVM_CAP_ARM_IDSR_TO_USER:
> +          r = 0;
> +          kvm->arch.return_idsr_to_user = true;
> +          break;
>    default:
>            r = -EINVAL;
>            break;
> @@ -195,6 +199,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm,
> long ext)
>    case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
>    case KVM_CAP_ARM_NISV_TO_USER:
>    case KVM_CAP_ARM_INJECT_EXT_DABT:
> +  case KVM_CAP_ARM_IDSR_TO_USER:
>            r = 1;
>            break;
>    case KVM_CAP_ARM_SET_DEVICE_ADDR:
> @@ -656,6 +661,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> *vcpu, struct kvm_run *run)
>                    return ret;
>    }
>
> +  if (run->exit_reason == KVM_EXIT_ARM_IDSR) {
> +          ret = kvm_handle_idsr_return(vcpu, vcpu->run);
> +          if (ret)
> +                  return ret;
> +  }
> +
>    if (run->immediate_exit)
>            return -EINTR;

What's seem to be missing is the similar handling of 32bit CP15 accesses,
as HCR_EL2.TIDCP also traps these (from ARMv8 ARM E_a):

<quote>
In AArch32 state, MCR and MRC access to instructions with the following
encodings are
trapped and reported using EC syndrome 0x03:
— All coproc==p15, CRn==c9, opc1 == {0-7}, CRm == {c0-c2, c5-c8}, opc2 ==
{0-7}.
— All coproc==p15, CRn==c10, opc1 =={0-7}, CRm == {c0, c1, c4, c8}, opc2 ==
{0-7}.
— All coproc==p15, CRn==c11, opc1=={0-7}, CRm == {c0-c8, c15}, opc2 ==
{0-7}.
</quote>

There is also the case of EL0 accesses, for which the ARM ARM strongly
suggest that
these are handled as an UNDEF, and I wish KVM would keep enforcing this.

Finally, there's a much wider discussion to be had on whether we want enable non-architectural guests. I must say that I'm not overly keen on it, but I'm willing to hear about the various use cases you could have in mind. I guess
that this is aimed at the emulation of vertically integrated operating
systems,
but that's only a guess...

Thanks,

        M.
--
Jazz is not dead. It just smells funny...

I must admit that I haven't taken an in-depth look at 32-bit accesses,
since it was out of scope of our project (which currently focuses only
on newish versions of iOS). IMO, the patch is useful as is, even without
the 32-bit support - but I can see how it looks like a job half-done.

The 32bit guest support definitely isn't optional. I appreciate that you
focus on 64bit, but having different behaviours between the two is not
really something I want to entertain.

Unfortunately, we don't have a 32-bit setup that would allow us to
properly test the code before submitting. I could copy the changes over,
but submitting the patches without actually testing them seems
irresponsible to me.

Well, you seem to at least have QEMU. This should be enough to test
*something*. Also, if you're testing this with KVM, what are you running
KVM on?

Interestingly, EL0 access to implementation-defined registers currently
results in an UNDEF, even though I expected it to be passed on to our
handler (I saw this behavior with a custom system register we defined
for direct communication with the hypervisor from a user-mode program we
developed). I tried following the ARM documentation to figure out what
could cause such a behavior, but so far I'm at a loss.

Here's your answer:

"When the value of HCR_EL2.TIDCP is 1, it is IMPLEMENTATION DEFINED whether any of this functionality accessed from EL0 is trapped to EL2. If it is not,
then it is UNDEFINED, and any attempt to access it from EL0 generates an
exception that is taken to EL1."

Also, I don't really understand how you define a custom system register.
Unless you're writing the HW as well, of course.

As far as the whole idea of supporting non-architectural guests... First
of all, you're absolutely spot-on regarding your guess - as mentioned
above, we're attempting to virtualize iOS. I believe that in the current
state of the patch, given its opt-in nature and limited scope
(implementation-defined registers), its benefits (the potential to
successfully execute non-architetural guests) outweigh the negatives
(extra code in the kernel, that's unnecessary to most of the guests
under ARM/KVM). I would be happy to give more examples (other than iOS)
that use these registers, but unfortunately, I'm not aware of any such
systems (though I wouldn't be surprised if Samsung has some
Exynos-specifc code, but that's just a shot in the dark).

Oh, absolutely *all* implementations have IMPDEF registers. You just
have to look at the TRM for any of the ARM implementations, which
document a number of them.

And as for having non-architectural code in the kernel, that's of course,
as you've guessed, a total no-no-no. We made that mistake on 32bit, and
stopped quickly!

My worry is more that you'd need to start faking a lot more than just a
couple of IMPDEP sysregs. All the ID registers, for a start. Then find
ways to deal with the host's errata, that the guest has to know about
(despite the ID registers being faked). And make sure that no workaround
that happen in the guest result in anything unexpected.

Some of that is possible (the ID register bit, although it requires some
very serious work), and you could cross your fingers hoping that no erratum
will get in the way. But it seems pretty fragile...

Thanks,

        M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reply via email to