On 7/4/19 5:24 PM, Jan Kiszka wrote:
> On 04.07.19 17:18, Ralf Ramsauer wrote:
>>
>>
>> On 7/4/19 4:39 PM, Jan Kiszka wrote:
>>> On 04.07.19 15:43, Ralf Ramsauer wrote:
>>>> Hi,
>>>>
>>>> we have some trouble starting non-root Linux on an AMD box. I already
>>>> tried to narrow things down, but it raised several questions.
>>>>
>>>>
>>>> The main problem is, that non-root Linux tries to write to LVT0, and
>>>> jailhouse crashes with:
>>>>
>>>> FATAL: Setting invalid LVT delivery mode (reg 35, value 00000700)
>>>>
>>>>
>>>> Turns out, in comparison to Intel x86, we don't trap on APIC reads, we
>>>> only intercept APIC write on AMD (cf. svm.c:338). I thought this would
>>>> be the issue of this bug, as that's an obvious difference between Intel
>>>> and AMD: on VMX, we do trap xAPIC reads and writes. However, VMX works
>>>> slightly different in these regards (side note: [1]).
>>>>
>>>> xAPIC reads on AMD systems don't trap the hypervisor, so I intercepted
>>>> reads (by removing the present bit of the XAPIC_PAGE of the guest), and
>>>> forwarded the traps to the apic dispatcher (adjusted VMEXIT_NPF).
>>>>
>>>> I can confirm that we now trap reads as well as writes. But the
>>>> non-root
>>>> Linux still crashes with the same error.
>>>>
>>>> Digging a bit deeper, I found out that xAPIC reads are directly
>>>> forwarded to the hardware, if they were intercepted. So this explains
>>>> why the bug still remains. This raised another question regarding xAPIC
>>>> handling on Intel:
>>>>
>>>> On AMD, we don't intercept xAPIC reads. On Intel, we do, as we
>>>> follow the strategy mentioned in [1]… But why?
>>>
>>> It accelerates write dispatching at least. I never did the comparison
>>> if> using a different access scheme would be beneficial because xAPIC is
>>> practically dead on Intel.
>>
>> Hmm... The change and benchmark should be pretty easy. Once a bunch of
>> other issues is solved, I'll maybe have a look at this.
>>
>
> As I said: you will optimize a legacy code path, not practically
> relevant. If that will simplify the code, though, I might still be
> interested :).
>
>>>
>>>>
>>>> Wouldn't it be more performant to just trap on xAPIC writes on
>>>> Intel? This could be done by switching from APIC_ACCESS
>>>> interception
>>>> to simple write-only trap & emulate (page faults).
>>>>
>>>> However, back to the initial issue. Looks like the difference between
>>>> Intel and AMD boot is as follows.
>>>>
>>>> AMD:
>>>> [ 0.325578] Switched APIC routing to physical flat.
>>>> [ 0.366464] enabled ExtINT on CPU#0
>>>>
>>>> Intel:
>>>> [ 0.099486] Switched APIC routing to physical flat.
>>>> [ 0.113000] masked ExtINT on CPU#0
>>>>
>>>>
>>>> This is why the above-mentioned Jailhouse crash occurs. I tried to find
>>>> out why Linux takes this decision on AMD. Our victim is in apic.c:1587.
>>>>
>>>> On Intel, apic_read(LVT0) & APIC_LVT_MASKED results in 65536, on AMD it
>>>> is 0. This is why we take a different path.
>>>>
>>>> Now the question is simple -- why? :-)
>>>>
>>>> Are we just lacking ExtINT delivery mode in Jailhouse, or is anything
>>>> else odd?
>>>
>>> Yes, the ExtINT makes no sense for secondary cells, and it should also
>>> not be needed for primary ones. Let's dig deeper:
>>>
>>> value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
>>> if (!cpu && (pic_mode || !value || skip_ioapic_setup)) {
>>> value = APIC_DM_EXTINT;
>>> apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n", cpu);
>>>
>>> What are the values here, and which are different?
>>
>> As already mentioned above, only value differs:
>>
>>>> On Intel, apic_read(LVT0) & APIC_LVT_MASKED results in 65536, on AMD
>>>> it is 0. This is why we take a different path.
>>
>> cpu, pic_mode and skip_ioapic_setup is 0 on both machines.
>
> Ah, ok. Then you need to find the evil guy unmasking LVT0 before that.
> Can't be Jailhouse: we hand it over masked.
Yes, I checked this. Actually we do. But...
When the cell is created after jailhouse is enabled, apic_clear() will
be called when the SIPI is received. There, I added some
instrumentation. At that moment, LVT0 holds (and keeps) 0x10000.
In addition to that, I instrumented the linux-loader. There, I read back
LVT0. Very early, before we hand over to Linux. No one else touches LVT0
in the meanwhile. I would see any other guest access as interceptions
are instrumented (both, read and write).
So in the linux-loader, the read back causes a vmexit, and I read back
0x0. That's really strange, there is - afaict - no other access in the
meanwhile.
I don't know what's going on there. I don't see any other modifications
of LVT registers in code paths other than apic_clear().
Ralf
>
> Jan
>
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jailhouse-dev/03de53c2-f063-2288-3ae0-c813b53642ac%40oth-regensburg.de.
For more options, visit https://groups.google.com/d/optout.