Had another debugging session with a Lauterbach debugger. CPU0 still lives, but does not respond to interrupts, as the cell does not respond to pings, nor does the serial console work. When I change the inmate WFI to NOP in a live session on the target, the root cell immediately starts to process interrupts again. When WFI is hit, interrupts apparently either don't reach CPU0 or are not processed. Which apparently means, that the inmate's WFI somehow blocks the hypervisor, respectively the root cell. Could this be a GIC routing problem?
-----Ursprüngliche Nachricht----- Von: Jan Kiszka [mailto:[email protected]] Gesendet: Dienstag, 23. Juli 2019 12:20 An: von Wiarda, Jan; Antonios Motakis (Tony); Mark Rutland Cc: JailhouseMailingListe Betreff: Re: 64 bit Hypervisor crash at 32 bit WFI instruction On 23.07.19 12:14, von Wiarda, Jan wrote: > Hi! > > With > > asm volatile("nop" : : : "memory"); > > instead of > > asm volatile("wfi" : : : "memory"); > > it runs just fine. > >> Is the root cell cpu (CPU 0) specifically crashing with an unexpected >> synchronous exit to Jailhouse? What is the output? > > No, CPU 0 does not crash with any kind of console output, which makes > debugging even more difficult. What I observe is, that after hitting WFI, it > continues to run for a 1-2 seconds and then it stops. Last thing I see from > the instrumented code is a printk() from arch_skip_instruction(), which means > it was handling a SYS64 exit. Maybe interrupts get stalled for the root cell - for whatever reason. Do you have a hardware debugger to analyze the state of the CPUs? Or use QEMU... Jan > >> This is a far shot, but maybe the code generated around the WFI is the >> culprit? > > You might be right, when I place WFI right after inmate_main(), CPU 0 does > not starve. But it's completely strange and undefined behaviour, sometimes it > crashes if I put the WFI right after a printk(), whereas right before the > printk() it doesn't crash. > > Works: > > void inmate_main(void) > { > ... > asm volatile("wfi" : : : "memory"); > printk("IVSHMEM: Done setting up...\n"); > printk("IVSHMEM: waiting for interrupt.\n"); > //asm volatile("wfi" : : : "memory"); > } > > Does not work: > > void inmate_main(void) > { > ... > //asm volatile("wfi" : : : "memory"); > printk("IVSHMEM: Done setting up...\n"); > printk("IVSHMEM: waiting for interrupt.\n"); > asm volatile("wfi" : : : "memory"); > } > > I know this sounds completely strange but I reproduced this multiple times, > compiler is this: > > gcc version 6.3.0 20170516 (Debian 6.3.0-18) > > BR, > Jan > > -----Ursprüngliche Nachricht----- > Von: Antonios Motakis (Tony) [mailto:[email protected]] > Gesendet: Dienstag, 23. Juli 2019 06:40 > An: von Wiarda, Jan; Mark Rutland > Cc: JailhouseMailingListe; Jan Kiszka > Betreff: Re: AW: 64 bit Hypervisor crash at 32 bit WFI instruction > > Hi Jan, > > On 22-Jul-19 7:11 PM, von Wiarda, Jan wrote: >> Hi Mark, >> >> I'm not touching bit 13 or 14 in HCR_EL2, they're both 0. HCR_EL2 is the >> same for 64 bit and 32 bit inmates when the crash happens, except for >> HCR_RW_BIT, obviously. HCR_EL2 value is 0x28001B at crash time. >> > > It's quite an interesting crash that you have there; I wouldn't expect this > to happen. > > The idea with trapping WFI/WFE is to be able to suspend a VM that is just > waiting for something to happen. Since Jailhouse is a partitioning > hypervisor, you shouldn't need to trap it, nor should its use normally > influence the other cores. Yet something is amiss here. > > Is the root cell cpu (CPU 0) specifically crashing with an unexpected > synchronous exit to Jailhouse? What is the output? > > I don't remember what event 0x28001B maps to, I would check the ARM ARM first > to figure out what the unexpected event in CPU 0 was, for a clue to motivate > further investigation. > > Additionally, this WFI code instructs the compiler that memory contents may > change, so ordering of generated instructions, inserted barriers etc, are > influenced. This is a far shot, but maybe the code generated around the WFI > is the culprit? Maybe not, but I would try to rule it out: > (a) First I'd try replacing the WFI with a nop, to observe the behavior > without the WFI but without changing compiler behavior and maintaining any > compiler barriers. > (b) I would also try replacing it with an infinite loop ("b .") to get the > inmate to wait forever at this position, and see what happens. > > Happy debugging :) > > Best regards, > Tony > -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/95F51F4B902CAC40AF459205F6322F01C4EE0E41E3%40BMK019S01.emtrion.local.
<<attachment: winmail.dat>>
