On 7/5/19 8:55 AM, Jan Kiszka wrote:
> On 04.07.19 22:56, Ralf Ramsauer wrote:
>> On 7/4/19 5:24 PM, Jan Kiszka wrote:
>>> On 04.07.19 17:18, Ralf Ramsauer wrote:
>>>>
>>>>
>>>> On 7/4/19 4:39 PM, Jan Kiszka wrote:
>>>>> On 04.07.19 15:43, Ralf Ramsauer wrote:
>>>>>> Hi,
>>>>>>
>>>>>> we have some trouble starting non-root Linux on an AMD box. I already
>>>>>> tried to narrow things down, but it raised several questions.
>>>>>>
>>>>>>
>>>>>> The main problem is, that non-root Linux tries to write to LVT0, and
>>>>>> jailhouse crashes with:
>>>>>>
>>>>>> FATAL: Setting invalid LVT delivery mode (reg 35, value
>>>>>> 00000700)
>>>>>>
>>>>>>
>>>>>> Turns out, in comparison to Intel x86, we don't trap on APIC
>>>>>> reads, we
>>>>>> only intercept APIC write on AMD (cf. svm.c:338). I thought this
>>>>>> would
>>>>>> be the issue of this bug, as that's an obvious difference between
>>>>>> Intel
>>>>>> and AMD: on VMX, we do trap xAPIC reads and writes. However, VMX
>>>>>> works
>>>>>> slightly different in these regards (side note: [1]).
>>>>>>
>>>>>> xAPIC reads on AMD systems don't trap the hypervisor, so I
>>>>>> intercepted
>>>>>> reads (by removing the present bit of the XAPIC_PAGE of the
>>>>>> guest), and
>>>>>> forwarded the traps to the apic dispatcher (adjusted VMEXIT_NPF).
>>>>>>
>>>>>> I can confirm that we now trap reads as well as writes. But the
>>>>>> non-root
>>>>>> Linux still crashes with the same error.
>>>>>>
>>>>>> Digging a bit deeper, I found out that xAPIC reads are directly
>>>>>> forwarded to the hardware, if they were intercepted. So this explains
>>>>>> why the bug still remains. This raised another question regarding
>>>>>> xAPIC
>>>>>> handling on Intel:
>>>>>>
>>>>>> On AMD, we don't intercept xAPIC reads. On Intel, we do, as we
>>>>>> follow the strategy mentioned in [1]… But why?
>>>>>
>>>>> It accelerates write dispatching at least. I never did the comparison
>>>>> if> using a different access scheme would be beneficial because
>>>>> xAPIC is
>>>>> practically dead on Intel.
>>>>
>>>> Hmm... The change and benchmark should be pretty easy. Once a bunch of
>>>> other issues is solved, I'll maybe have a look at this.
>>>>
>>>
>>> As I said: you will optimize a legacy code path, not practically
>>> relevant. If that will simplify the code, though, I might still be
>>> interested :).
>>>
>>>>>
>>>>>>
>>>>>> Wouldn't it be more performant to just trap on xAPIC writes on
>>>>>> Intel? This could be done by switching from APIC_ACCESS
>>>>>> interception
>>>>>> to simple write-only trap & emulate (page faults).
>>>>>>
>>>>>> However, back to the initial issue. Looks like the difference between
>>>>>> Intel and AMD boot is as follows.
>>>>>>
>>>>>> AMD:
>>>>>> [ 0.325578] Switched APIC routing to physical flat.
>>>>>> [ 0.366464] enabled ExtINT on CPU#0
>>>>>>
>>>>>> Intel:
>>>>>> [ 0.099486] Switched APIC routing to physical flat.
>>>>>> [ 0.113000] masked ExtINT on CPU#0
>>>>>>
>>>>>>
>>>>>> This is why the above-mentioned Jailhouse crash occurs. I tried to
>>>>>> find
>>>>>> out why Linux takes this decision on AMD. Our victim is in
>>>>>> apic.c:1587.
>>>>>>
>>>>>> On Intel, apic_read(LVT0) & APIC_LVT_MASKED results in 65536, on
>>>>>> AMD it
>>>>>> is 0. This is why we take a different path.
>>>>>>
>>>>>> Now the question is simple -- why? :-)
>>>>>>
>>>>>> Are we just lacking ExtINT delivery mode in Jailhouse, or is anything
>>>>>> else odd?
>>>>>
>>>>> Yes, the ExtINT makes no sense for secondary cells, and it should also
>>>>> not be needed for primary ones. Let's dig deeper:
>>>>>
>>>>> value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
>>>>> if (!cpu && (pic_mode || !value || skip_ioapic_setup)) {
>>>>> value = APIC_DM_EXTINT;
>>>>> apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n", cpu);
>>>>>
>>>>> What are the values here, and which are different?
>>>>
>>>> As already mentioned above, only value differs:
>>>>
>>>>>> On Intel, apic_read(LVT0) & APIC_LVT_MASKED results in 65536, on AMD
>>>>>> it is 0. This is why we take a different path.
>>>>
>>>> cpu, pic_mode and skip_ioapic_setup is 0 on both machines.
>>>
>>> Ah, ok. Then you need to find the evil guy unmasking LVT0 before that.
>>> Can't be Jailhouse: we hand it over masked.
>>
>> Yes, I checked this. Actually we do. But...
>>
>> When the cell is created after jailhouse is enabled, apic_clear() will
>> be called when the SIPI is received. There, I added some
>> instrumentation. At that moment, LVT0 holds (and keeps) 0x10000.
>>
>> In addition to that, I instrumented the linux-loader. There, I read back
>> LVT0. Very early, before we hand over to Linux. No one else touches LVT0
>> in the meanwhile. I would see any other guest access as interceptions
>> are instrumented (both, read and write).
>>
>> So in the linux-loader, the read back causes a vmexit, and I read back
>> 0x0. That's really strange, there is - afaict - no other access in the
>> meanwhile.
>>
>> I don't know what's going on there. I don't see any other modifications
>> of LVT registers in code paths other than apic_clear().
>
> Maybe you can lift the setup into KVM and check if you can reproduce
> there as well. That will allow to track down the other access that does
> the enabling. It shouldn't be possible that the hardware does that on
> its own.
Tried to run Jailhouse on QEMU on a AMD machine with nested KVM.
I currently see no way to test this on qemu, as Jailhouse seems to be
pretty unstable. We horribly crash in many situations on kvm:
- High chance of freezes when enabling jailhouse
- I loose devices if I don't reroute interrupts to CPU0 before I
create cells
- cell destroy doesn't work. We freeze and after a while: "Ignoring NMI
IPI to CPU 1"
- Starting causes exceptions inside jailhouse
So Jailhouse definitely runs more stable on bare-metal than on qemu/SVM.
I need to find another way to debug this.
Ralf
>
> Jan
>
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/jailhouse-dev/08841e36-df70-50e8-8094-426d72fead52%40oth-regensburg.de.
For more options, visit https://groups.google.com/d/optout.