Looking at the xen code used for the ec2 guest kernels, this is not overloading 
the generic spinlock struct with xen data. So at least that cannot overflow. 
That said, the whole xen spinlock code there is a snapshot from quite a while 
ago. And I had been working on importing a number of changes to that. But the 
result was so different from the current released code that moving forward 
seems rather scary.
First thought on all these CPUs being in the hypercall was that the 
callback/wakeup from there was failing. But there is also the possibility that 
somehow the notification about releasing the lock is not sent. The code uses 
some sort of a stacking list and maybe the workload you found has a better 
chance of getting that messed up...
Not sure what the best way to go forward would be. Trying to isolate the 
spinlock related changes from the big update then try those or just have a 
recent build of the big update and try that. The first option takes more time 
and probably iterations while the latter may bring other problems.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/929941

Title:
  Kernel deadlock in scheduler on m2.{2,4}xlarge EC2 instance

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/929941/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to