>>>>> Stéphane Graber <[email protected]> writes:

<snip/>

    > Did you try "lxc-stop -n <container> -k" which is the upstream supported
    > way of forcefully killing a container?

Yes. As mentioned, I even tried lxc-stop -k -t <timeout>

    > In theory lxc-stop sends SIGPWR, then waits 30s and sends SIGKILL to
    > init.

Ha good. So may be I didn't wait enough on my last test but I'm pretty
sure I did.

Now, while debugging this I indeed tried to kill the init process as at
least compiz and X was listed as defunct.

    > If SIGKILL doesn't work, then you have much bigger problems
    > (typically kernel related).

That could very well be the case.

But then, I would expect lxc-stop to fail with some error code and
respect the -t timeout. In which case I can fallback to reboot but only
in that case.

    > So please try with -k, 

I did.

    > if that doesn't work,

It didn't.

    > please let me access one of those hanging machines so I can
    > confirm that it's not an LXC issue and that something in the
    > kernel is indeed making one of the tasks unkillable.

With pleasure, but that will have to wait :-/

I had to put the reboot hack in place to restore service, we'll need to
plan an interruption to give you access (I don't think we can reproduce
that on a different host).

And I'll be sprinting this week and be in vacations for the next 2 weeks.

But rest assured I'll get back to you ;)

So thanks a lot for the quick feedback (on the bug too !). 

I'm pretty sure you're right about the deeper kernel issue, it matches
my tests last Friday, I couldn't kill the init process and I had issues
killing the other ones so... I had to reboot in the end.

And stay tuned, I'll ping you as soon as I can setup a reproducing env ;)

    Vincent

-- 
Mailing list: https://launchpad.net/~canonical-ci-engineering
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~canonical-ci-engineering
More help   : https://help.launchpad.net/ListHelp

Reply via email to