Am 29.09.21 um 11:12 schrieb Jan Kiszka:
On 29.09.21 09:50, Martin Kaistra wrote:
Am 28.09.21 um 11:28 schrieb Jan Kiszka:
On 27.09.21 10:30, Martin Kaistra wrote:
Am 24.09.21 um 17:46 schrieb Jan Kiszka:

If suspend_cpu() does not progress, the target CPU is not reacting
properly on the request to leave the guest and service the Jailhouse
commands. Could be that you interrupts are not handles properly. Run
"jailhouse config check" on your setup, maybe you are passing the
interrupt controller through.

Or are you using SDEI-based management interrupts? Would require a
special TF-A version, so likely does not happen "by chance".

Jan


Hi Jan,

"jailhouse config check" finds no problems with the root cell and inmate
configs.
Also, SDEI is not active. gicv2_send_sgi() is being used.


Then it would be good to continue debugging, now trying to understand
what the target CPU is doing.

The CPU that requests the suspend sets suspend_cpu in the target data
structure, then sends an IPI to that CPU and wait for the other side to
confirm this via setting cpu_suspended. Check if the target CPU received
the IPI, left the guest mode or what else it does by instrumenting the
related code paths (check_events on arm64).

Jan


The times, when there is no freeze, I can see after cpu0 calls
arch_send_event() -> gicv2_send_sgi() from suspend_cpu(), on cpu1 there
is irqchip_handle_irq() -> arch_handle_sgi() -> check_events().

However in the not working case, after going into suspend_cpu() on cpu0,
there seem to be no interrupts landing on cpu1, I get no debug prints
from irqchip_handle_irq or check_events.

But there also arch_send_event() called in the broken case?

And in both cases cpu1 is inside the guest when the suspension request
is started?

Yes, arch_send_event() is also called in the broken case. These are the logs with my added annotations:

broken case:



....

Activating hypervisor

psci_dispatch: 0xc4000001

[   18.583357] The Jailhouse is opening.

gicv2_send_sgi: cpu 0

irqchip_handle_irq (sgi, cpu 1)

gicv2_send_sgi: cpu 0

gicv2_send_sgi: cpu 1

gicv2_send_sgi: cpu 0

irqchip_handle_irq (sgi, cpu 1)

irqchip_handle_irq (sgi, cpu 0)

irqchip_handle_irq (sgi, cpu 1)

gicv2_send_sgi: cpu 0

irqchip_handle_irq (sgi, cpu 1)

....

[   18.681300] CPU1: shutdown

psci_dispatch: 0x84000002

psci_dispatch: 0xc4000004

[   18.688683] psci: CPU1 killed (polled 4 ms)

[   18.693551] All CPUs removed!

cell_suspend: Running on cpu #0

About to suspend cpu #1

suspend_cpu()

arch_send_event

gicv2_send_sgi: cpu 0

suspend_cpu() loop



=========================================================

working case:



....

Activating hypervisor

gicv2_send_sgi: cpu 1

irqchip_handle_irq (sgi, cpu 0)

gicv2_send_sgi: cpu 1

irqchip_handle_irq (sgi, cpu 0)

[   17.908806] The Jailhouse is opening.

gicv2_send_sgi: cpu 1

gicv2_send_sgi: cpu 0

irqchip_handle_irq (sgi, cpu 1)

irqchip_handle_irq (sgi, cpu 0)

gicv2_send_sgi: cpu 1

gicv2_send_sgi: cpu 0

irqchip_handle_irq (sgi, cpu 1)

irqchip_handle_irq (sgi, cpu 0)

....

psci_dispatch: 0x84000002

[   18.008498] CPU1: shutdown

psci_dispatch: 0xc4000004

[   18.014133] psci: CPU1 killed (polled 4 ms)

[   18.019385] All CPUs removed!

cell_suspend: Running on cpu #0

About to suspend cpu #1

suspend_cpu()

arch_send_event

gicv2_send_sgi: cpu 0

suspend_cpu() loop

irqchip_handle_irq (sgi, cpu 1)

check_events: running on cpu #1

Created cell "inmate-demo"

Page pool usage after cell creation: mem 62/992, remap 37/131072

[   18.055831] Created Jailhouse cell "inmate-demo"



Maybe there is a HW problem? But why does it seem to work sometimes..


I would call for a HW problem only after truly excluding all software
issues.

Jan


--
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/31bc8f60-9f99-79c5-77ae-59482f7bd92a%40linutronix.de.

Reply via email to