On 2018-08-08 15:12, Ralf Ramsauer wrote:
Hi Jan,

On 08/08/2018 08:21 AM, Jan Kiszka wrote:
On 2018-08-07 12:42, Ralf Ramsauer wrote:

On 08/06/2018 04:33 PM, Ralf Ramsauer wrote:

On 07/24/2018 05:58 PM, Jan Kiszka wrote:

Don't know if I understand that race correctly: The problem is that a
CPU which is already suspended might be suspended again for some other
reason, right? To resolve this, we would have to synchronously wait in
the second caller of arch_suspend_cpu(), until the first suspension is
handled. Hmm, sounds more like a problem to be solved with a
rather than a lock: can there be a third caller of suspend_cpu?

I didn't think all cases through, in fact, and there might be also no
issue because of the control_lock protection. However, understanding
logic will definitely be easier if you know that after arch_resume_cpu
returned, the target CPU left the event loop for sure.

I've got a question on the control_lock/cpu_suspend/suspend_cpu logic:

Let's say we have a scenario with three CPUs, one target, two callers.
Both callers want to suspend-resume the target.

Note that the existing suspend/resume logic is not designed for
concurrent usage. That's one reason why we suspend the root cell when
doing operations on other cells.

I think it should be designed for concurrent use, because there could be
concurrent misuse. Example: Think of the (silly) case, where two CPUs of
the root-cell try to concurrently destroy a cell. Or in general, try to
invoke concurrent management actions. E.g., via two parallel jailhouse
loads (the HV cell_set_loadable path). That's stupid but possible.

Try this snippet on QEMU x86. It simply parallelises cell destroy calls
in a way to have a higher chance hit the window. It triggers the race in
~3 of 5 runs on my system:

diff --git a/driver/cell.c b/driver/cell.c
index 67edb732..8c6e6c8b 100644
--- a/driver/cell.c
+++ b/driver/cell.c
@@ -399,12 +399,25 @@ int jailhouse_cmd_cell_start(const char __user *arg)
         return err;

+static int tmp_err;
+static void trigger_race_via_cell_destroy(void *info)
+       unsigned int cell_id = *(unsigned int*)info;
+       int err;
+       err = jailhouse_call_arg1(JAILHOUSE_HC_CELL_DESTROY, cell_id);
+       if (!err)
+               tmp_err = 0;
  static int cell_destroy(struct cell *cell)
         unsigned int cpu;
         int err;

-       err = jailhouse_call_arg1(JAILHOUSE_HC_CELL_DESTROY, cell->id);
+       tmp_err = -1;
+       on_each_cpu(trigger_race_via_cell_destroy, &cell->id, true);
+       err = tmp_err;
         if (err)
                 return err;

This test first of all causes a deadlock pretty early:

CPU 0                   CPU 1
-----                   -----
hypercall()             hypercall()
 cell_destroy()          cell_destroy()
  ...cell_suspend(root)   ...cell_suspend(root)
   arch_cpu_suspend(1)     arch_cpu_suspend(0)


That's kind of expected because we do not account for this concurrency with the management API. I thought that was documented in the API description but it seems there is some room for improvements.

I'm still undecided if we should address such races beyond that.


You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jailhouse-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to