Re: 64 bit Hypervisor crash at 32 bit WFI instruction

Jan Kiszka Fri, 02 Aug 2019 03:30:26 -0700

On 02.08.19 12:24, von Wiarda, Jan wrote:
> Had another debugging session with a Lauterbach debugger. CPU0 still lives, 
> but does not respond to interrupts, as the cell does not respond to pings, 
> nor does the serial console work. When I change the inmate WFI to NOP in a 
> live session on the target, the root cell immediately starts to process 
> interrupts again. When WFI is hit, interrupts apparently either don't reach 
> CPU0 or are not processed. Which apparently means, that the inmate's WFI 
> somehow blocks the hypervisor, respectively the root cell. Could this be a 
> GIC routing problem?


Not sure if I asked this already, but I've seen this error pattern too often:
Double-check if neither the root cell nor the non-root one accidentally got
direct (non-intercepted) access to any of the GIC resources.

Jan

> 
> -----Ursprüngliche Nachricht-----
> Von: Jan Kiszka [mailto:[email protected]] 
> Gesendet: Dienstag, 23. Juli 2019 12:20
> An: von Wiarda, Jan; Antonios Motakis (Tony); Mark Rutland
> Cc: JailhouseMailingListe
> Betreff: Re: 64 bit Hypervisor crash at 32 bit WFI instruction
> 
> On 23.07.19 12:14, von Wiarda, Jan wrote:
>> Hi!
>>
>> With
>>
>> asm volatile("nop" : : : "memory");
>>
>> instead of
>>
>> asm volatile("wfi" : : : "memory");
>>
>> it runs just fine.
>>
>>> Is the root cell cpu (CPU 0) specifically crashing with an unexpected 
>>> synchronous exit to Jailhouse? What is the output?
>>
>> No, CPU 0 does not crash with any kind of console output, which makes 
>> debugging even more difficult. What I observe is, that after hitting WFI, it 
>> continues to run for a 1-2 seconds and then it stops. Last thing I see from 
>> the instrumented code is a printk() from arch_skip_instruction(), which 
>> means it was handling a SYS64 exit.
> 
> Maybe interrupts get stalled for the root cell - for whatever reason. Do you
> have a hardware debugger to analyze the state of the CPUs? Or use QEMU...
> 
> Jan
> 
>>
>>> This is a far shot, but maybe the code generated around the WFI is the 
>>> culprit?
>>
>> You might be right, when I place WFI right after inmate_main(), CPU 0 does 
>> not starve. But it's completely strange and undefined behaviour, sometimes 
>> it crashes if I put the WFI right after a printk(), whereas right before the 
>> printk() it doesn't crash.
>>
>> Works:
>>
>> void inmate_main(void)
>> {
>>              ...
>>              asm volatile("wfi" : : : "memory");
>>              printk("IVSHMEM: Done setting up...\n");
>>              printk("IVSHMEM: waiting for interrupt.\n");
>>              //asm volatile("wfi" : : : "memory");
>> }
>>
>> Does not work:
>>
>> void inmate_main(void)
>> {
>>              ...
>>              //asm volatile("wfi" : : : "memory");
>>              printk("IVSHMEM: Done setting up...\n");
>>              printk("IVSHMEM: waiting for interrupt.\n");
>>              asm volatile("wfi" : : : "memory");
>> }
>>
>> I know this sounds completely strange but I reproduced this multiple times, 
>> compiler is this:
>>
>> gcc version 6.3.0 20170516 (Debian 6.3.0-18)
>>
>> BR,
>> Jan
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Antonios Motakis (Tony) [mailto:[email protected]] 
>> Gesendet: Dienstag, 23. Juli 2019 06:40
>> An: von Wiarda, Jan; Mark Rutland
>> Cc: JailhouseMailingListe; Jan Kiszka
>> Betreff: Re: AW: 64 bit Hypervisor crash at 32 bit WFI instruction
>>
>> Hi Jan,
>>
>> On 22-Jul-19 7:11 PM, von Wiarda, Jan wrote:
>>> Hi Mark,
>>>
>>> I'm not touching bit 13 or 14 in HCR_EL2, they're both 0. HCR_EL2 is the 
>>> same for 64 bit and 32 bit inmates when the crash happens, except for 
>>> HCR_RW_BIT, obviously. HCR_EL2 value is 0x28001B at crash time.
>>>
>>
>> It's quite an interesting crash that you have there; I wouldn't expect this 
>> to happen.
>>
>> The idea with trapping WFI/WFE is to be able to suspend a VM that is just 
>> waiting for something to happen. Since Jailhouse is a partitioning 
>> hypervisor, you shouldn't need to trap it, nor should its use normally 
>> influence the other cores. Yet something is amiss here.
>>
>> Is the root cell cpu (CPU 0) specifically crashing with an unexpected 
>> synchronous exit to Jailhouse? What is the output?
>>
>> I don't remember what event 0x28001B maps to, I would check the ARM ARM 
>> first to figure out what the unexpected event in CPU 0 was, for a clue to 
>> motivate further investigation.
>>
>> Additionally, this WFI code instructs the compiler that memory contents may 
>> change, so ordering of generated instructions, inserted barriers etc, are 
>> influenced. This is a far shot, but maybe the code generated around the WFI 
>> is the culprit? Maybe not, but I would try to rule it out:
>> (a) First I'd try replacing the WFI with a nop, to observe the behavior 
>> without the WFI but without changing compiler behavior and maintaining any 
>> compiler barriers.
>> (b) I would also try replacing it with an infinite loop ("b .") to get the 
>> inmate to wait forever at this position, and see what happens.
>>
>> Happy debugging :)
>>
>> Best regards,
>> Tony
>>
> 

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/93235ca9-0149-bdc8-4cf5-858bcda26ec8%40siemens.com.

Re: 64 bit Hypervisor crash at 32 bit WFI instruction

Reply via email to