On 2017-08-23 15:06, Gustavo Lima Chaves wrote:
> * Jan Kiszka <[email protected]> [2017-08-23 00:00:09 +0000]:
> 
>> On 2017-08-22 18:57, Gustavo Lima Chaves wrote:
>>> This is a first take on the TODO-list entry
>>>
>>>   - whitelist-based MSR access [v1.0]
>>>
>>> *for Intel architecture*. All the architectural MSRs where given a look
>>> before the coding started: they were categorized, for ease of finding
>>> things out when one needs to refer back to them and, for the ones that
>>> were absolutely necessary (at least given the needs of a typical x86
>>> Linux build, on both root and inmate cell contexts), access without
>>> VM-exits was granted. Some actual model specific entries are there as
>>> well, noticed while testing with our baremetal hardware.
>>>
>>> Machine-check exception, thermal event interrupts and others, at least
>>> on IA, can commonly have scope broader than current core only (e. g. the
>>> whole package). We tried the best only to give access to registers in
>>> that domain that would not impact other cores in any hazardous way, e.
>>> g. enable/disable some MCE errors. We only made such accesses possible
>>> because Linux relies on them. Currently we're doing nothing on writes
>>> for these problematic cases and everything seems to run just fine on the
>>> inmates.
>>>
>>> A lot of Linux requirements regarding MSR access could be checked when
>>> destroying other inmates (or disabling the hypervisor altogether), when
>>> the CPUs go back to the root cell and it has to bring them online
>>> again—a lot of MSR interaction happens at those routines. The rest of
>>> the required MSRs could be checked running Linux as inmate.
>>>
>>> The whitelist is structured as to be easy as possible to receive
>>> additions/corrections.
>>>
>>> Signed-off-by: Gustavo Lima Chaves <[email protected]>
>>> ---
>>>  hypervisor/arch/x86/include/asm/processor.h |  12 ++
>>>  hypervisor/arch/x86/vcpu.c                  |  41 ++++++
>>>  hypervisor/arch/x86/vmx.c                   | 218 
>>> +++++++++++++++++++++++-----
>>>  3 files changed, 236 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/hypervisor/arch/x86/include/asm/processor.h 
>>> b/hypervisor/arch/x86/include/asm/processor.h
>>> index a658039..3236ab7 100644
>>> --- a/hypervisor/arch/x86/include/asm/processor.h
>>> +++ b/hypervisor/arch/x86/include/asm/processor.h
>>> @@ -72,12 +72,24 @@
>>>  
>>>  #define MSR_IA32_APICBASE                          0x0000001b
>>>  #define MSR_IA32_FEATURE_CONTROL                   0x0000003a
>>> +#define MSR_IA32_MCG_CTL                           0x0000017b
>>>  #define MSR_IA32_PAT                                       0x00000277
>>>  #define MSR_IA32_MTRR_DEF_TYPE                             0x000002ff
>>>  #define MSR_IA32_SYSENTER_CS                               0x00000174
>>>  #define MSR_IA32_SYSENTER_ESP                              0x00000175
>>>  #define MSR_IA32_SYSENTER_EIP                              0x00000176
>>> +#define MSR_IA32_PERF_CTL                          0x00000199
>>> +#define MSR_IA32_THERM_INTERRUPT                   0x0000019b
>>> +#define MSR_IA32_MISC_ENABLE                               0x000001a0
>>> +#define MSR_OFFCORE_RSP_0                          0x000001a6
>>> +#define MSR_OFFCORE_RSP_1                          0x000001a7
>>> +#define MSR_IA32_PACKAGE_THERM_INTERRUPT           0x000001b2
>>> +#define MSR_IA32_MC0_CTL2                          0x00000280
>>> +#define MSR_IA32_MC31_CTL2                         0x0000029f
>>> +#define MSR_IA32_FIXED_CTR_CTRL                    0x0000038d
>>>  #define MSR_IA32_PERF_GLOBAL_CTRL                  0x0000038f
>>> +#define MSR_IA32_MC0_CTL                           0x00000400
>>> +#define MSR_IA32_MC28_MISC                         0x00000473
>>>  #define MSR_IA32_VMX_BASIC                         0x00000480
>>>  #define MSR_IA32_VMX_PINBASED_CTLS                 0x00000481
>>>  #define MSR_IA32_VMX_PROCBASED_CTLS                        0x00000482
>>> diff --git a/hypervisor/arch/x86/vcpu.c b/hypervisor/arch/x86/vcpu.c
>>> index 638d166..c9e541d 100644
>>> --- a/hypervisor/arch/x86/vcpu.c
>>> +++ b/hypervisor/arch/x86/vcpu.c
>>> @@ -35,6 +35,7 @@ static u8 __attribute__((aligned(PAGE_SIZE))) 
>>> parking_code[PAGE_SIZE] = {
>>>  };
>>>  
>>>  struct paging_structures parking_pt;
>>> +static u32 misc_enable_reserved_bits = 0x1;
>>
>> Why a variable? Seems constant.
> 
> Ugh, will fix.
> 
>>
>>>  
>>>  int vcpu_early_init(void)
>>>  {
>>> @@ -325,6 +326,46 @@ bool vcpu_handle_msr_write(void)
>>>             vcpu_vendor_set_guest_pat((val & MTRR_ENABLE) ?
>>>                                       cpu_data->pat : 0);
>>>             break;
>>> +   case MSR_IA32_MISC_ENABLE:
>>> +           /* Check for Fast-Strings Enable bit only set */
>>> +           val = get_wrmsr_value(&cpu_data->guest_regs);
>>> +           if ((misc_enable_reserved_bits & val)
>>> +               != misc_enable_reserved_bits) {
>>> +                   printk("FATAL: Invalid value on MSR_IA32_MISC_ENABLE "
>>> +                          "write: %lx\n", val);
>>> +                   return false;
>>> +           }
> 
> For all MSRs here, it seems Linux does not give a sh*t when it reads
> back the same value, but of course more testing is always valid.
> 
> 
>>> +           /* No-op for the following, for they may affect things
>>> +            * on a granularity bigger than originating core
>>> +            * and/or they will be handled later on open TODO
>>> +            * entries (e.g. MCE processing and managed
>>> +            * forwarding) */
>>> +   case MSR_IA32_MCG_CTL:
>>> +           /* Enables/disables MCE reporting (globally) */
> 
> For this one, it seems KVM just keeps a copy to W/R from, doing
> nothing with it.
> 
>>> +   case MSR_IA32_MC0_CTL ... MSR_IA32_MC28_MISC:
>>> +           /* Control signaling of MC for errors produced by a
>>> +            * particular hardware unit */
> 
> Ditto, at least from a quick glance.
> 
>>> +   case MSR_IA32_MC0_CTL2 ... MSR_IA32_MC31_CTL2:
>>> +           /* Programming interface to use corrected MC error
>>> +            * signaling */
>>> +   case MSR_IA32_PERF_CTL:
>>> +           /* Used to temporarily disable opportunistic processor
>>> +            * performance operation, but may affect the whole
>>> +            * system */
> 
> I just saw reads returning 0 for this one.
> 
>>> +   case MSR_IA32_FIXED_CTR_CTRL:
>>> +           /* Control for fixed-function performance counters. May be
>>> +            * unique per package. */
>>> +   case MSR_IA32_THERM_INTERRUPT:
>>> +   case MSR_IA32_PACKAGE_THERM_INTERRUPT:
>>> +           /* Management of thermal events. The non-package
>>> +            * variant may still be unique on some
>>> +            * micro-architectures */
>>> +   case MSR_OFFCORE_RSP_0:
>>> +   case MSR_OFFCORE_RSP_1:
>>> +           /* These offcore counters have information on shared
>>> +            * resources, so we'd better block at least writing on
>>> +            * them */
>>
>> None of these ignored writes cause troubles to Linux when it reads them
>> back? Does KVM do something more for any of them?
> 
> That's the quick impression on KVM code WRT that.

OK.

> 
>>
>>> +           break;
>>>     default:
>>>             panic_printk("FATAL: Unhandled MSR write: %lx\n",
>>>                          cpu_data->guest_regs.rcx);
>>> diff --git a/hypervisor/arch/x86/vmx.c b/hypervisor/arch/x86/vmx.c
>>> index 0a6e0ce..0868901 100644
>>> --- a/hypervisor/arch/x86/vmx.c
>>> +++ b/hypervisor/arch/x86/vmx.c
>>> @@ -34,48 +34,196 @@ static const struct segment invalid_seg = {
>>>     .access_rights = 0x10000
>>>  };
>>>  
>>> -/* bit cleared: direct access allowed */
>>> -// TODO: convert to whitelist
>>> +/* MSR access whitelist: each bit *set* (LSB 0) will cause a VM-exit,
>>> + * so all explicit registers listed out of "denial slices" are safe to
>>> + * be accessed in the given mode without Jailhouse's intervention */
>>>  static u8 __attribute__((aligned(PAGE_SIZE))) msr_bitmap[][0x2000/8] = {
>>> -   [ VMX_MSR_BMP_0000_READ ] = {
>>> -           [      0/8 ...  0x26f/8 ] = 0,
>>> -           [  0x270/8 ...  0x277/8 ] = 0x80, /* 0x277 */
>>> -           [  0x278/8 ...  0x2f7/8 ] = 0,
>>> -           [  0x2f8/8 ...  0x2ff/8 ] = 0x80, /* 0x2ff */
>>> -           [  0x300/8 ...  0x7ff/8 ] = 0,
>>> -           [  0x800/8 ...  0x807/8 ] = 0x0c, /* 0x802, 0x803 */
>>> -           [  0x808/8 ...  0x80f/8 ] = 0xa5, /* 0x808, 0x80a, 0x80d, 0x80f 
>>> */
>>> -           [  0x810/8 ...  0x817/8 ] = 0xff, /* 0x810 - 0x817 */
>>> -           [  0x818/8 ...  0x81f/8 ] = 0xff, /* 0x818 - 0x81f */
>>> -           [  0x820/8 ...  0x827/8 ] = 0xff, /* 0x820 - 0x827 */
>>> -           [  0x828/8 ...  0x82f/8 ] = 0x81, /* 0x828, 0x82f */
>>> -           [  0x830/8 ...  0x837/8 ] = 0xfd, /* 0x830, 0x832 - 0x837 */
>>> -           [  0x838/8 ...  0x83f/8 ] = 0x43, /* 0x838, 0x839, 0x83e */
>>> -           [  0x840/8 ... 0x1fff/8 ] = 0,
>>> +    [ VMX_MSR_BMP_0000_READ ] = {
>>> +           /* Deny everything first */
>>> +           [   0x0/8 ...  0x1fff/8 ] = 0xff,
>>
>> Hmm, that will prevent us from enabled -Werror=override-init, like we
>> did for configs. Not yet sure if that is bad or acceptable. Is
>> overriding like this well defined behaviour in C?
> 
> Mm, can we use -Wno-override-init just for this object, maybe? An
> earlier version of this patch here had all the wholes of 0xff filled
> by hand and it was a huge PITA to maintain, believe me.

We don't have many of such tables, and I am fine with this safer
approach - if it is not on the edge of the compiler spec.

Jan

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to