On 6/5/19 12:06 AM, Jan Kiszka wrote:
> On 04.06.19 23:02, Ralf Ramsauer wrote:
>> mov (%rax), %ax is a 16-bit data MOV_FROM_MEM that will emit
>> 0x66 0x8b 0x00.
>>
>> 0x66 is the operand-size override prefix which we currently do not
>> support.
>>
>> We should support it, as we can find this opcode, for example, for some
>> mmconfig space access from Linux (e.g., pci_generic_config_read).
>>
>> This also adds appropriate mmio-access tests.
>>
>> Tested in QEMU virtual target.
>>
>> Signed-off-by: Ralf Ramsauer <[email protected]>
>> ---
>>   hypervisor/arch/x86/include/asm/processor.h |  1 +
>>   hypervisor/arch/x86/mmio.c                  | 37 +++++++++++++++++----
>>   inmates/tests/x86/mmio-access-32.c          |  4 +++
>>   inmates/tests/x86/mmio-access.c             |  4 +++
>>   4 files changed, 40 insertions(+), 6 deletions(-)
>>
>> diff --git a/hypervisor/arch/x86/include/asm/processor.h
>> b/hypervisor/arch/x86/include/asm/processor.h
>> index 70a6c3ff..d8111690 100644
>> --- a/hypervisor/arch/x86/include/asm/processor.h
>> +++ b/hypervisor/arch/x86/include/asm/processor.h
>> @@ -145,6 +145,7 @@
>>     #define X86_REX_CODE                    4
>>   +#define X86_PREFIX_OP_SZ                0x66
>>   #define X86_PREFIX_ADDR_SZ                0x67
>>     #define X86_OP_MOVZX_OPC1                0x0f
>> diff --git a/hypervisor/arch/x86/mmio.c b/hypervisor/arch/x86/mmio.c
>> index b234bd79..6d9ad1c5 100644
>> --- a/hypervisor/arch/x86/mmio.c
>> +++ b/hypervisor/arch/x86/mmio.c
>> @@ -79,6 +79,26 @@ static unsigned int get_address_width(bool
>> has_addrsz_prefix)
>>           (!!(cs_attr & VCPU_CS_DB) ^ has_addrsz_prefix) ? 4 : 2;
>>   }
>>   +static unsigned int get_op_width(bool has_rex_w, bool has_opsz_prefix)
> 
> We should move all the flags into parse_context so that we can pass them
> around more easily.

Good point.

> 
>> +{
>> +    u16 cs_attr;
>> +    bool long_mode;
>> +
>> +    /* Op size prefix is ignored if rex.w = 1 */
>> +    if (has_rex_w)
>> +        return 8;
>> +
>> +    cs_attr = vcpu_vendor_get_cs_attr();
>> +    long_mode = (vcpu_vendor_get_efer() & EFER_LMA) &&
>> +            (cs_attr & VCPU_CS_L);
> 
> This may mean accessing the same VMCS regs multiple times. I vaguely
> recall from KVM that it pays off to avoid that and keep the results
> cached (per vmexit).

See comment below…

(BTW: not that it solved this issue, but we could also consider to
inline VCPU_VENDOR_GET_ accessors)

> 
>> +
>> +    if (long_mode)
>> +        /* CS.d is ignored in long mode */
>> +        return has_opsz_prefix ? 2 : 4;
>> +
>> +    return (!!(cs_attr & VCPU_CS_DB) ^ has_opsz_prefix) ? 4 : 2;
> 
> This does the same as get_address_width (minus different output values),
> but its code format is different. Should be aligned.

Yeah. I chose this style, as it took me a while to understand what

long_mode ? (has_addrsz_prefix ? 4 : 8) : (!!(cs_attr & VCPU_CS_DB) ^
has_addrsz_prefix) ? 4 : 2;

actually means -- with respect to ?'s operator precedence (which is
clear in this case, but I got confused). May I propose to rather align
the line mentioned above? (but let's see -- maybe we can consolidate it
in any case)

> 
> In fact, I could imagine a combined helper:
> 
> void parse_widths(struct parse_context *ctx,
>                   struct mmio_instruction *inst,
>                   bool parse_addr_width)
> 
> That one would obtain cs_attr and long_mode only once, would do the
> address width thing only if parse_addr_width is true, and would push its
> results directly into *inst.

... as an inlined single-user function, right?

Good idea, that would also save the potential double efer/cs_attr
access. I'm just curious: It's Intel's vmread that you worry about? At
least on AMD, guest's efer is directly read from vmcs region.

> 
>> +}
>> +
>>   struct mmio_instruction
>>   x86_mmio_parse(const struct guest_paging_structures *pg_structs,
>> bool is_write)
>>   {
>> @@ -94,6 +114,7 @@ x86_mmio_parse(const struct guest_paging_structures
>> *pg_structs, bool is_write)
>>       bool has_rex_w = false;
>>       bool has_rex_r = false;
>>       bool has_addrsz_prefix = false;
>> +    bool has_opsz_prefix = false;
>>         if (!ctx_update(&ctx, &pc, 0, pg_structs))
>>           goto error_noinst;
>> @@ -114,9 +135,13 @@ restart:
>>       }
>>       switch (op[0].raw) {
>>       case X86_PREFIX_ADDR_SZ:
>> +    case X86_PREFIX_OP_SZ:
>>           if (!ctx_update(&ctx, &pc, 1, pg_structs))
>>               goto error_noinst;
>> -        has_addrsz_prefix = true;
>> +        if (op[0].raw == X86_PREFIX_ADDR_SZ)
>> +            has_addrsz_prefix = true;
>> +        else
>> +            has_opsz_prefix = true;
> 
> I would avoid the double-dispatching just to save one ctx_update.

Ok.

> 
>>           goto restart;
>>       case X86_OP_MOVZX_OPC1:
>>           if (!ctx_update(&ctx, &pc, 1, pg_structs))
>> @@ -134,28 +159,28 @@ restart:
>>           does_write = true;
>>           break;
>>       case X86_OP_MOV_TO_MEM:
>> -        inst.access_size = has_rex_w ? 8 : 4;
>> +        inst.access_size = get_op_width(has_rex_w, has_opsz_prefix);
>>           does_write = true;
>>           break;
>>       case X86_OP_MOVB_FROM_MEM:
>>           inst.access_size = 1;
>>           break;
>>       case X86_OP_MOV_FROM_MEM:
>> -        inst.access_size = has_rex_w ? 8 : 4;
>> +        inst.access_size = get_op_width(has_rex_w, has_opsz_prefix);
>>           break;
>>       case X86_OP_MOV_IMMEDIATE_TO_MEM:
>> -        inst.access_size = has_rex_w ? 8 : 4;
>> +        inst.access_size = get_op_width(has_rex_w, has_opsz_prefix);
>>           has_immediate = true;
>>           does_write = true;
>>           break;
>>       case X86_OP_MOV_MEM_TO_AX:
>>           inst.inst_len += get_address_width(has_addrsz_prefix);
>> -        inst.access_size = has_rex_w ? 8 : 4;
>> +        inst.access_size = get_op_width(has_rex_w, has_opsz_prefix);
>>           inst.in_reg_num = 15;
>>           goto final;
>>       case X86_OP_MOV_AX_TO_MEM:
>>           inst.inst_len += get_address_width(has_addrsz_prefix);
>> -        inst.access_size = has_rex_w ? 8 : 4;
>> +        inst.access_size = get_op_width(has_rex_w, has_opsz_prefix);
>>           inst.out_val = guest_regs->by_index[15];
>>           does_write = true;
>>           goto final;
>> diff --git a/inmates/tests/x86/mmio-access-32.c
>> b/inmates/tests/x86/mmio-access-32.c
>> index 2f47f211..b4f83850 100644
>> --- a/inmates/tests/x86/mmio-access-32.c
>> +++ b/inmates/tests/x86/mmio-access-32.c
>> @@ -41,6 +41,10 @@ void inmate_main(void)
>>       mmio_write32(mmio_reg, pattern);
>>       EXPECT_EQUAL(*comm_page_reg, pattern);
>>   +    /* MOV_FROM_MEM (8b), 16-bit data, 32-bit address, OP size
>> prefix */
>> +    asm volatile("mov (%%eax), %%ax" : "=a" (reg32) : "a" (mmio_reg));
>> +    EXPECT_EQUAL(reg32, (u16)pattern);
> 
> We should try to cover all cases of current get_op_width.

Ok.

> 
>> +
>>       /* MOV_FROM_MEM (8b), 32-bit data, 32-bit address */
>>       asm volatile("movl (%%ebx), %%eax"
>>           : "=a" (reg32) : "a" (0), "b" (mmio_reg));
>> diff --git a/inmates/tests/x86/mmio-access.c
>> b/inmates/tests/x86/mmio-access.c
>> index 0e6a56b1..86694353 100644
>> --- a/inmates/tests/x86/mmio-access.c
>> +++ b/inmates/tests/x86/mmio-access.c
>> @@ -51,6 +51,10 @@ void inmate_main(void)
>>       mmio_write64(mmio_reg, pattern);
>>       EXPECT_EQUAL(*comm_page_reg, pattern);
>>   +    /* MOV_FROM_MEM (8b), 16-bit data, Ox66 OP size prefix */
>> +    asm volatile("mov (%%rax), %%ax" : "=a" (reg64) : "a" (mmio_reg));
>> +    EXPECT_EQUAL(reg64, (u16)pattern);
>> +
>>       /* MOV_FROM_MEM (8b), 64-bit data, mod=0, reg=0, rm=3 */
>>       asm volatile("movq (%%rbx), %%rax"
>>           : "=a" (reg64) : "a" (0), "b" (mmio_reg));
>>
> 
> Thanks for picking this up. I guess we need to complete that aspect of
> the instruction parsing. Eventually, it will be simpler to argue about
> being complete, rather than why not being so.

Yeah, I somehow feel sorry that we need this.

Just had a look at KVM's emulate.c. Oh my dear.

  Ralf

> 
> Jan
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/c72d9659-2d46-5aff-062b-9d7d13dfcef4%40oth-regensburg.de.
For more options, visit https://groups.google.com/d/optout.

Reply via email to