On 2017-08-23 15:06, Gustavo Lima Chaves wrote:
> * Jan Kiszka <[email protected]> [2017-08-23 00:00:09 +0000]:
>
>> On 2017-08-22 18:57, Gustavo Lima Chaves wrote:
>>> This is a first take on the TODO-list entry
>>>
>>> - whitelist-based MSR access [v1.0]
>>>
>>> *for Intel architecture*. All the architectural MSRs where given a look
>>> before the coding started: they were categorized, for ease of finding
>>> things out when one needs to refer back to them and, for the ones that
>>> were absolutely necessary (at least given the needs of a typical x86
>>> Linux build, on both root and inmate cell contexts), access without
>>> VM-exits was granted. Some actual model specific entries are there as
>>> well, noticed while testing with our baremetal hardware.
>>>
>>> Machine-check exception, thermal event interrupts and others, at least
>>> on IA, can commonly have scope broader than current core only (e. g. the
>>> whole package). We tried the best only to give access to registers in
>>> that domain that would not impact other cores in any hazardous way, e.
>>> g. enable/disable some MCE errors. We only made such accesses possible
>>> because Linux relies on them. Currently we're doing nothing on writes
>>> for these problematic cases and everything seems to run just fine on the
>>> inmates.
>>>
>>> A lot of Linux requirements regarding MSR access could be checked when
>>> destroying other inmates (or disabling the hypervisor altogether), when
>>> the CPUs go back to the root cell and it has to bring them online
>>> again—a lot of MSR interaction happens at those routines. The rest of
>>> the required MSRs could be checked running Linux as inmate.
>>>
>>> The whitelist is structured as to be easy as possible to receive
>>> additions/corrections.
>>>
>>> Signed-off-by: Gustavo Lima Chaves <[email protected]>
>>> ---
>>> hypervisor/arch/x86/include/asm/processor.h | 12 ++
>>> hypervisor/arch/x86/vcpu.c | 41 ++++++
>>> hypervisor/arch/x86/vmx.c | 218
>>> +++++++++++++++++++++++-----
>>> 3 files changed, 236 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/hypervisor/arch/x86/include/asm/processor.h
>>> b/hypervisor/arch/x86/include/asm/processor.h
>>> index a658039..3236ab7 100644
>>> --- a/hypervisor/arch/x86/include/asm/processor.h
>>> +++ b/hypervisor/arch/x86/include/asm/processor.h
>>> @@ -72,12 +72,24 @@
>>>
>>> #define MSR_IA32_APICBASE 0x0000001b
>>> #define MSR_IA32_FEATURE_CONTROL 0x0000003a
>>> +#define MSR_IA32_MCG_CTL 0x0000017b
>>> #define MSR_IA32_PAT 0x00000277
>>> #define MSR_IA32_MTRR_DEF_TYPE 0x000002ff
>>> #define MSR_IA32_SYSENTER_CS 0x00000174
>>> #define MSR_IA32_SYSENTER_ESP 0x00000175
>>> #define MSR_IA32_SYSENTER_EIP 0x00000176
>>> +#define MSR_IA32_PERF_CTL 0x00000199
>>> +#define MSR_IA32_THERM_INTERRUPT 0x0000019b
>>> +#define MSR_IA32_MISC_ENABLE 0x000001a0
>>> +#define MSR_OFFCORE_RSP_0 0x000001a6
>>> +#define MSR_OFFCORE_RSP_1 0x000001a7
>>> +#define MSR_IA32_PACKAGE_THERM_INTERRUPT 0x000001b2
>>> +#define MSR_IA32_MC0_CTL2 0x00000280
>>> +#define MSR_IA32_MC31_CTL2 0x0000029f
>>> +#define MSR_IA32_FIXED_CTR_CTRL 0x0000038d
>>> #define MSR_IA32_PERF_GLOBAL_CTRL 0x0000038f
>>> +#define MSR_IA32_MC0_CTL 0x00000400
>>> +#define MSR_IA32_MC28_MISC 0x00000473
>>> #define MSR_IA32_VMX_BASIC 0x00000480
>>> #define MSR_IA32_VMX_PINBASED_CTLS 0x00000481
>>> #define MSR_IA32_VMX_PROCBASED_CTLS 0x00000482
>>> diff --git a/hypervisor/arch/x86/vcpu.c b/hypervisor/arch/x86/vcpu.c
>>> index 638d166..c9e541d 100644
>>> --- a/hypervisor/arch/x86/vcpu.c
>>> +++ b/hypervisor/arch/x86/vcpu.c
>>> @@ -35,6 +35,7 @@ static u8 __attribute__((aligned(PAGE_SIZE)))
>>> parking_code[PAGE_SIZE] = {
>>> };
>>>
>>> struct paging_structures parking_pt;
>>> +static u32 misc_enable_reserved_bits = 0x1;
>>
>> Why a variable? Seems constant.
>
> Ugh, will fix.
>
>>
>>>
>>> int vcpu_early_init(void)
>>> {
>>> @@ -325,6 +326,46 @@ bool vcpu_handle_msr_write(void)
>>> vcpu_vendor_set_guest_pat((val & MTRR_ENABLE) ?
>>> cpu_data->pat : 0);
>>> break;
>>> + case MSR_IA32_MISC_ENABLE:
>>> + /* Check for Fast-Strings Enable bit only set */
>>> + val = get_wrmsr_value(&cpu_data->guest_regs);
>>> + if ((misc_enable_reserved_bits & val)
>>> + != misc_enable_reserved_bits) {
>>> + printk("FATAL: Invalid value on MSR_IA32_MISC_ENABLE "
>>> + "write: %lx\n", val);
>>> + return false;
>>> + }
>
> For all MSRs here, it seems Linux does not give a sh*t when it reads
> back the same value, but of course more testing is always valid.
>
>
>>> + /* No-op for the following, for they may affect things
>>> + * on a granularity bigger than originating core
>>> + * and/or they will be handled later on open TODO
>>> + * entries (e.g. MCE processing and managed
>>> + * forwarding) */
>>> + case MSR_IA32_MCG_CTL:
>>> + /* Enables/disables MCE reporting (globally) */
>
> For this one, it seems KVM just keeps a copy to W/R from, doing
> nothing with it.
>
>>> + case MSR_IA32_MC0_CTL ... MSR_IA32_MC28_MISC:
>>> + /* Control signaling of MC for errors produced by a
>>> + * particular hardware unit */
>
> Ditto, at least from a quick glance.
>
>>> + case MSR_IA32_MC0_CTL2 ... MSR_IA32_MC31_CTL2:
>>> + /* Programming interface to use corrected MC error
>>> + * signaling */
>>> + case MSR_IA32_PERF_CTL:
>>> + /* Used to temporarily disable opportunistic processor
>>> + * performance operation, but may affect the whole
>>> + * system */
>
> I just saw reads returning 0 for this one.
>
>>> + case MSR_IA32_FIXED_CTR_CTRL:
>>> + /* Control for fixed-function performance counters. May be
>>> + * unique per package. */
>>> + case MSR_IA32_THERM_INTERRUPT:
>>> + case MSR_IA32_PACKAGE_THERM_INTERRUPT:
>>> + /* Management of thermal events. The non-package
>>> + * variant may still be unique on some
>>> + * micro-architectures */
>>> + case MSR_OFFCORE_RSP_0:
>>> + case MSR_OFFCORE_RSP_1:
>>> + /* These offcore counters have information on shared
>>> + * resources, so we'd better block at least writing on
>>> + * them */
>>
>> None of these ignored writes cause troubles to Linux when it reads them
>> back? Does KVM do something more for any of them?
>
> That's the quick impression on KVM code WRT that.
OK.
>
>>
>>> + break;
>>> default:
>>> panic_printk("FATAL: Unhandled MSR write: %lx\n",
>>> cpu_data->guest_regs.rcx);
>>> diff --git a/hypervisor/arch/x86/vmx.c b/hypervisor/arch/x86/vmx.c
>>> index 0a6e0ce..0868901 100644
>>> --- a/hypervisor/arch/x86/vmx.c
>>> +++ b/hypervisor/arch/x86/vmx.c
>>> @@ -34,48 +34,196 @@ static const struct segment invalid_seg = {
>>> .access_rights = 0x10000
>>> };
>>>
>>> -/* bit cleared: direct access allowed */
>>> -// TODO: convert to whitelist
>>> +/* MSR access whitelist: each bit *set* (LSB 0) will cause a VM-exit,
>>> + * so all explicit registers listed out of "denial slices" are safe to
>>> + * be accessed in the given mode without Jailhouse's intervention */
>>> static u8 __attribute__((aligned(PAGE_SIZE))) msr_bitmap[][0x2000/8] = {
>>> - [ VMX_MSR_BMP_0000_READ ] = {
>>> - [ 0/8 ... 0x26f/8 ] = 0,
>>> - [ 0x270/8 ... 0x277/8 ] = 0x80, /* 0x277 */
>>> - [ 0x278/8 ... 0x2f7/8 ] = 0,
>>> - [ 0x2f8/8 ... 0x2ff/8 ] = 0x80, /* 0x2ff */
>>> - [ 0x300/8 ... 0x7ff/8 ] = 0,
>>> - [ 0x800/8 ... 0x807/8 ] = 0x0c, /* 0x802, 0x803 */
>>> - [ 0x808/8 ... 0x80f/8 ] = 0xa5, /* 0x808, 0x80a, 0x80d, 0x80f
>>> */
>>> - [ 0x810/8 ... 0x817/8 ] = 0xff, /* 0x810 - 0x817 */
>>> - [ 0x818/8 ... 0x81f/8 ] = 0xff, /* 0x818 - 0x81f */
>>> - [ 0x820/8 ... 0x827/8 ] = 0xff, /* 0x820 - 0x827 */
>>> - [ 0x828/8 ... 0x82f/8 ] = 0x81, /* 0x828, 0x82f */
>>> - [ 0x830/8 ... 0x837/8 ] = 0xfd, /* 0x830, 0x832 - 0x837 */
>>> - [ 0x838/8 ... 0x83f/8 ] = 0x43, /* 0x838, 0x839, 0x83e */
>>> - [ 0x840/8 ... 0x1fff/8 ] = 0,
>>> + [ VMX_MSR_BMP_0000_READ ] = {
>>> + /* Deny everything first */
>>> + [ 0x0/8 ... 0x1fff/8 ] = 0xff,
>>
>> Hmm, that will prevent us from enabled -Werror=override-init, like we
>> did for configs. Not yet sure if that is bad or acceptable. Is
>> overriding like this well defined behaviour in C?
>
> Mm, can we use -Wno-override-init just for this object, maybe? An
> earlier version of this patch here had all the wholes of 0xff filled
> by hand and it was a huge PITA to maintain, believe me.
We don't have many of such tables, and I am fine with this safer
approach - if it is not on the edge of the compiler spec.
Jan
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.