Re: AMD: non-root linux inmates

Jan Kiszka Fri, 05 Jul 2019 05:37:40 -0700

On 05.07.19 14:34, Ralf Ramsauer wrote:



On 7/5/19 8:55 AM, Jan Kiszka wrote:

On 04.07.19 22:56, Ralf Ramsauer wrote:

On 7/4/19 5:24 PM, Jan Kiszka wrote:

On 04.07.19 17:18, Ralf Ramsauer wrote:



On 7/4/19 4:39 PM, Jan Kiszka wrote:

On 04.07.19 15:43, Ralf Ramsauer wrote:

Hi,

we have some trouble starting non-root Linux on an AMD box. I already
tried to narrow things down, but it raised several questions.


The main problem is, that non-root Linux tries to write to LVT0, and
jailhouse crashes with:

      FATAL: Setting invalid LVT delivery mode (reg 35, value
00000700)


Turns out, in comparison to Intel x86, we don't trap on APIC
reads, we
only intercept APIC write on AMD (cf. svm.c:338). I thought this
would
be the issue of this bug, as that's an obvious difference between
Intel
and AMD: on VMX, we do trap xAPIC reads and writes. However, VMX
works
slightly different in these regards (side note: [1]).

xAPIC reads on AMD systems don't trap the hypervisor, so I
intercepted
reads (by removing the present bit of the XAPIC_PAGE of the
guest), and
forwarded the traps to the apic dispatcher (adjusted VMEXIT_NPF).

I can confirm that we now trap reads as well as writes. But the
non-root
Linux still crashes with the same error.

Digging a bit deeper, I found out that xAPIC reads are directly
forwarded to the hardware, if they were intercepted. So this explains
why the bug still remains. This raised another question regarding
xAPIC
handling on Intel:

      On AMD, we don't intercept xAPIC reads. On Intel, we do, as we
      follow the strategy mentioned in [1]… But why?


It accelerates write dispatching at least. I never did the comparison
if> using a different access scheme would be beneficial because
xAPIC is
practically dead on Intel.


Hmm... The change and benchmark should be pretty easy. Once a bunch of
other issues is solved, I'll maybe have a look at this.


As I said: you will optimize a legacy code path, not practically
relevant. If that will simplify the code, though, I might still be
interested :).


      Wouldn't it be more performant to just trap on xAPIC writes on
      Intel? This could be done by switching from APIC_ACCESS
interception
      to simple write-only trap & emulate (page faults).

However, back to the initial issue. Looks like the difference between
Intel and AMD boot is as follows.

AMD:
[    0.325578] Switched APIC routing to physical flat.
[    0.366464] enabled ExtINT on CPU#0

Intel:
[    0.099486] Switched APIC routing to physical flat.
[    0.113000] masked ExtINT on CPU#0


This is why the above-mentioned Jailhouse crash occurs. I tried to
find
out why Linux takes this decision on AMD. Our victim is in
apic.c:1587.

On Intel, apic_read(LVT0) & APIC_LVT_MASKED results in 65536, on
AMD it
is 0. This is why we take a different path.

Now the question is simple -- why? :-)

Are we just lacking ExtINT delivery mode in Jailhouse, or is anything
else odd?


Yes, the ExtINT makes no sense for secondary cells, and it should also
not be needed for primary ones. Let's dig deeper:

value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
if (!cpu && (pic_mode || !value || skip_ioapic_setup)) {
       value = APIC_DM_EXTINT;
       apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n", cpu);

What are the values here, and which are different?


As already mentioned above, only value differs:

On Intel, apic_read(LVT0) & APIC_LVT_MASKED results in 65536, on AMD
it is 0. This is why we take a different path.


cpu, pic_mode and skip_ioapic_setup is 0 on both machines.


Ah, ok. Then you need to find the evil guy unmasking LVT0 before that.
Can't be Jailhouse: we hand it over masked.


Yes, I checked this. Actually we do. But...

When the cell is created after jailhouse is enabled, apic_clear() will
be called when the SIPI is received. There, I added some
instrumentation. At that moment, LVT0 holds (and keeps) 0x10000.

In addition to that, I instrumented the linux-loader. There, I read back
LVT0. Very early, before we hand over to Linux. No one else touches LVT0
in the meanwhile. I would see any other guest access as interceptions
are instrumented (both, read and write).

So in the linux-loader, the read back causes a vmexit, and I read back
0x0.  That's really strange, there is - afaict - no other access in the
meanwhile.

I don't know what's going on there. I don't see any other modifications
of LVT registers in code paths other than apic_clear().


Maybe you can lift the setup into KVM and check if you can reproduce
there as well. That will allow to track down the other access that does
the enabling. It shouldn't be possible that the hardware does that on
its own.


Tried to run Jailhouse on QEMU on a AMD machine with nested KVM.

I currently see no way to test this on qemu, as Jailhouse seems to be
pretty unstable. We horribly crash in many situations on kvm:

  - High chance of freezes when enabling jailhouse
  - I loose devices if I don't reroute interrupts to CPU0 before I
    create cells
  - cell destroy doesn't work. We freeze and after a while: "Ignoring NMI
    IPI to CPU 1"
  - Starting causes exceptions inside jailhouse

So Jailhouse definitely runs more stable on bare-metal than on qemu/SVM.
I need to find another way to debug this.


OK...

Next strategy: Frequent read-back and validation of the APIC state. That mayhelp to narrow down the point where the bit flips. Make sure you read on theright CPU, tough.


Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

--
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/36fc6114-d1b6-da9f-f2b6-b76c1f33b7ed%40siemens.com.
For more options, visit https://groups.google.com/d/optout.

Re: AMD: non-root linux inmates

Reply via email to