On 11-09-18 15:13, Kees Bakker wrote: > Hey, > > Every now and then we have one or more containers in state ERROR. > Is there a clever method to recover from that, other than > rebooting the LXD server? > > Killing the monitor and the forkstart does help. And also a kworker > process (kworker/u16:0) is eating up one of the CPUs with 100% load. > lxc info gives "error: Monitor is hung" > > I'm running Ubuntu 16.04 with BTRFS. The kernel is 4.15.0-33-generic
Today it happened once again. This time it is on an Ubuntu 18.04 system with lvm storage backend. Kernel 4.15.0-34-generic. We don't stop/start containers usually. When they run it is all nice and dandy. But when we stop and start a container there is a big chance to trigger this ERROR situation. This time I needed to change the profile to get a bigger root volume in the container. There is a lxc monitor process hanging, and a kworker at 100% CPU load. The "lxc start" command hangs. Now "lxc list" shows the container in ERROR state. "lxc info" shows Error: Monitor is hung Killing the monitor does not help to revive from this situation. The only thing I can do is to reboot the LXD host. As you can imagine this is horrible, since there are several other containers running. Christian told us that this is probably a kernel problem. "If it is a kernel bug you're hitting there's nothing that LXD can do to help you." What I would like to know * are there more people who see this problem? * if not, why are we hitting it so often? * what kernel problem are we talking about? LXD is great, but this problem is becoming a nightmare, snif. -- Kees _______________________________________________ lxc-users mailing list [email protected] http://lists.linuxcontainers.org/listinfo/lxc-users
