Hi,

On 08/06/2018 04:33 PM, Ralf Ramsauer wrote:
> [snip]
> 
> On 07/24/2018 05:58 PM, Jan Kiszka wrote:
>>>
>>> Don't know if I understand that race correctly: The problem is that a
>>> CPU which is already suspended might be suspended again for some other
>>> reason, right? To resolve this, we would have to synchronously wait in
>>> the second caller of arch_suspend_cpu(), until the first suspension is
>>> handled. Hmm, sounds more like a problem to be solved with a semaphore,
>>> rather than a lock: can there be a third caller of suspend_cpu?
>>
>> I didn't think all cases through, in fact, and there might be also no
>> issue because of the control_lock protection. However, understanding the
>> logic will definitely be easier if you know that after arch_resume_cpu
>> returned, the target CPU left the event loop for sure.
> 
> I've got a question on the control_lock/cpu_suspend/suspend_cpu logic:
> 
> Let's say we have a scenario with three CPUs, one target, two callers.
> Both callers want to suspend-resume the target.
> 
> Assume the first caller wins, and suspend-resumes the target CPU. Now
> the second caller comes around and wants to suspend the target. Now what
> happens, if the target is still in hypervisor context but already set
> its cpu_suspended to false, but didn't leave the hypervisor context yet?
> 
> In this case, the second caller successfully takes target's control_lock
> and will send an IPI to the target. Will this IPI arrive at the target
> if the target is still handling the previous IPI of the other caller?

Yeah, seems like this is the root of the issue. I added some test code,
and looks like at least on x86, we're loosing IPI, if the target CPU is
about to leave hypervisor context, and already signalised that it is not
suspended any longer.

The caller will get stuck here in arch_suspend_cpu:

if (!target_suspended) {
        arch_send_nmi_ipi(target_data);
        while (!target_data->cpu_suspended)
                cpu_relax();
}

It enters the condition as it thinks the target is not suspended, sends
an IPI that will never arrive and spins forever, as the target won't
enter cpu_suspend.

  Ralf

> 
> 
> I'm currently testing the modified cpu register dump code (per-CPU
> regdump HC) with a hypercall gatling gun, and I constantly run into some
> nasty deadlocks (on qemu x86, didn't test arm yet) where
> arch_suspend_cpu() waits for a cpu to enter its suspend state and starves.
> 
> 
> Thanks
>   Ralf
> 
> 
> BTW: I have some patches in my queue to consolidate x86/arm
> arch_cpu_{suspend/resume}. The code path of both is basically equivalent.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jailhouse-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to