On 02.08.19 12:24, von Wiarda, Jan wrote: > Had another debugging session with a Lauterbach debugger. CPU0 still lives, > but does not respond to interrupts, as the cell does not respond to pings, > nor does the serial console work. When I change the inmate WFI to NOP in a > live session on the target, the root cell immediately starts to process > interrupts again. When WFI is hit, interrupts apparently either don't reach > CPU0 or are not processed. Which apparently means, that the inmate's WFI > somehow blocks the hypervisor, respectively the root cell. Could this be a > GIC routing problem?
Not sure if I asked this already, but I've seen this error pattern too often: Double-check if neither the root cell nor the non-root one accidentally got direct (non-intercepted) access to any of the GIC resources. Jan > > -----Ursprüngliche Nachricht----- > Von: Jan Kiszka [mailto:[email protected]] > Gesendet: Dienstag, 23. Juli 2019 12:20 > An: von Wiarda, Jan; Antonios Motakis (Tony); Mark Rutland > Cc: JailhouseMailingListe > Betreff: Re: 64 bit Hypervisor crash at 32 bit WFI instruction > > On 23.07.19 12:14, von Wiarda, Jan wrote: >> Hi! >> >> With >> >> asm volatile("nop" : : : "memory"); >> >> instead of >> >> asm volatile("wfi" : : : "memory"); >> >> it runs just fine. >> >>> Is the root cell cpu (CPU 0) specifically crashing with an unexpected >>> synchronous exit to Jailhouse? What is the output? >> >> No, CPU 0 does not crash with any kind of console output, which makes >> debugging even more difficult. What I observe is, that after hitting WFI, it >> continues to run for a 1-2 seconds and then it stops. Last thing I see from >> the instrumented code is a printk() from arch_skip_instruction(), which >> means it was handling a SYS64 exit. > > Maybe interrupts get stalled for the root cell - for whatever reason. Do you > have a hardware debugger to analyze the state of the CPUs? Or use QEMU... > > Jan > >> >>> This is a far shot, but maybe the code generated around the WFI is the >>> culprit? >> >> You might be right, when I place WFI right after inmate_main(), CPU 0 does >> not starve. But it's completely strange and undefined behaviour, sometimes >> it crashes if I put the WFI right after a printk(), whereas right before the >> printk() it doesn't crash. >> >> Works: >> >> void inmate_main(void) >> { >> ... >> asm volatile("wfi" : : : "memory"); >> printk("IVSHMEM: Done setting up...\n"); >> printk("IVSHMEM: waiting for interrupt.\n"); >> //asm volatile("wfi" : : : "memory"); >> } >> >> Does not work: >> >> void inmate_main(void) >> { >> ... >> //asm volatile("wfi" : : : "memory"); >> printk("IVSHMEM: Done setting up...\n"); >> printk("IVSHMEM: waiting for interrupt.\n"); >> asm volatile("wfi" : : : "memory"); >> } >> >> I know this sounds completely strange but I reproduced this multiple times, >> compiler is this: >> >> gcc version 6.3.0 20170516 (Debian 6.3.0-18) >> >> BR, >> Jan >> >> -----Ursprüngliche Nachricht----- >> Von: Antonios Motakis (Tony) [mailto:[email protected]] >> Gesendet: Dienstag, 23. Juli 2019 06:40 >> An: von Wiarda, Jan; Mark Rutland >> Cc: JailhouseMailingListe; Jan Kiszka >> Betreff: Re: AW: 64 bit Hypervisor crash at 32 bit WFI instruction >> >> Hi Jan, >> >> On 22-Jul-19 7:11 PM, von Wiarda, Jan wrote: >>> Hi Mark, >>> >>> I'm not touching bit 13 or 14 in HCR_EL2, they're both 0. HCR_EL2 is the >>> same for 64 bit and 32 bit inmates when the crash happens, except for >>> HCR_RW_BIT, obviously. HCR_EL2 value is 0x28001B at crash time. >>> >> >> It's quite an interesting crash that you have there; I wouldn't expect this >> to happen. >> >> The idea with trapping WFI/WFE is to be able to suspend a VM that is just >> waiting for something to happen. Since Jailhouse is a partitioning >> hypervisor, you shouldn't need to trap it, nor should its use normally >> influence the other cores. Yet something is amiss here. >> >> Is the root cell cpu (CPU 0) specifically crashing with an unexpected >> synchronous exit to Jailhouse? What is the output? >> >> I don't remember what event 0x28001B maps to, I would check the ARM ARM >> first to figure out what the unexpected event in CPU 0 was, for a clue to >> motivate further investigation. >> >> Additionally, this WFI code instructs the compiler that memory contents may >> change, so ordering of generated instructions, inserted barriers etc, are >> influenced. This is a far shot, but maybe the code generated around the WFI >> is the culprit? Maybe not, but I would try to rule it out: >> (a) First I'd try replacing the WFI with a nop, to observe the behavior >> without the WFI but without changing compiler behavior and maintaining any >> compiler barriers. >> (b) I would also try replacing it with an infinite loop ("b .") to get the >> inmate to wait forever at this position, and see what happens. >> >> Happy debugging :) >> >> Best regards, >> Tony >> > -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/93235ca9-0149-bdc8-4cf5-858bcda26ec8%40siemens.com.
