Re: [lxc-users] lxc-users Digest, Vol 16, Issue 2

Thouraya TH Wed, 02 Apr 2014 08:48:38 -0700

Thanks a lot for answer.

Yes, I’d like to checkpoint a container, migrate it to another node and
restart it. Is that possible with CRIU? (the checkpoint and the restart I
mean )


My second question: without CRIU, when I restart lxc-snapshot on another
node, it restarts correctly?

Thanks a lot.

Bests.


2014-04-02 13:00 GMT+01:00 <[email protected]>:

> Send lxc-users mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.linuxcontainers.org/listinfo/lxc-users
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lxc-users digest..."
>
> Today's Topics:
>
>    1. Snapshot of a LXC container (Thouraya TH)
>    2. Re: Snapshot of a LXC container (Rami Rosen)
>    3. Re: lxc_monitor exiting, but not cleaning monitor-fifo?
>       (Serge Hallyn)
>    4. Re: lxc_monitor exiting, but not cleaning monitor-fifo?
>       (Florian Klink)
>
>
> ---------- Message transféré ----------
> From: Thouraya TH <[email protected]>
> To: [email protected]
> Cc:
> Date: Tue, 1 Apr 2014 16:34:01 +0100
> Subject: [lxc-users] Snapshot of a LXC container
> Hello,
>
> Please, i have another question about "Snapshot of a LXC container"
> What's the difference between a and b:
> a) stop container, migrate it to another machine and restart it
> b) snapshot with a virtual machine on the container , migrate the
> snapshot, restart
> (solution system level)
>
>
>
>
> Thank you so much.
> Bests.
>
>
> ---------- Message transféré ----------
> From: Rami Rosen <[email protected]>
> To: LXC users mailing-list <[email protected]>
> Cc:
> Date: Tue, 1 Apr 2014 19:12:11 +0300
> Subject: Re: [lxc-users] Snapshot of a LXC container
>
> Hi,
> First I assume that by stopping and starting the container you mean
> checkpointing and restoring it, by CRIU.
>
> The difference is that by lxc-snapshot you save only the filesystem state
> of a container, whereas with checkpoint you save the full state of the
> container process and its children.
>
> Regards,
> Rami Rosen
> http://ramirose.wix.com/ramirosen
>  בתאריך 1 באפר 2014 18:34, "Thouraya TH" <[email protected]> כתב:
>
>> Hello,
>>
>> Please, i have another question about "Snapshot of a LXC container"
>> What's the difference between a and b:
>> a) stop container, migrate it to another machine and restart it
>> b) snapshot with a virtual machine on the container , migrate the
>> snapshot, restart
>> (solution system level)
>>
>>
>>
>>
>> Thank you so much.
>> Bests.
>>
>> _______________________________________________
>> lxc-users mailing list
>> [email protected]
>> http://lists.linuxcontainers.org/listinfo/lxc-users
>>
>
>
> ---------- Message transféré ----------
> From: Serge Hallyn <[email protected]>
> To: LXC users mailing-list <[email protected]>
> Cc: [email protected]
> Date: Tue, 1 Apr 2014 13:01:36 -0500
> Subject: Re: [lxc-users] lxc_monitor exiting, but not cleaning
> monitor-fifo?
> As an alternative to doing pidfiles, how about following the way
> that lxcapi_create does it with fcntl(fd, F_SETLKW?  (see
> create_partial() and ongoing_create()?
>
> Then if the monitor exited without being able to clean up, we can
> detect it and clean up.
>
>
>
> ---------- Message transféré ----------
> From: Florian Klink <[email protected]>
> To: Dwight Engen <[email protected]>, LXC users mailing-list <
> [email protected]>
> Cc:
> Date: Tue, 01 Apr 2014 22:15:25 +0200
> Subject: Re: [lxc-users] lxc_monitor exiting, but not cleaning
> monitor-fifo?
> Am 01.04.2014 01:49, schrieb Dwight Engen:
> > On Mon, 31 Mar 2014 23:18:13 +0200
> > Florian Klink <[email protected]> wrote:
> >
> >> Am 31.03.2014 21:13, schrieb Dwight Engen:
> >>> On Mon, 31 Mar 2014 20:34:15 +0200
> >>> Florian Klink <[email protected]> wrote:
> >>>
> >>>> Am 31.03.2014 20:10, schrieb Dwight Engen:
> >>>>> On Sat, 29 Mar 2014 23:39:33 +0100
> >>>>> Florian Klink <[email protected]> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> when running multiple lxc actions in row using the command line
> >>>>>> tools, I sometimes observe the following state:
> >>>>>>
> >>>>>>
> >>>>>> - lxc-monitord is not running anymore
> >>>>>> - /run/lxc/var/lib/lxc/monitor-fifo still exists, but is
> >>>>>> "refusing connection"
> >>>>>>
> >>>>>> In the logs, I then see the following:
> >>>>>>
> >>>>>>
> >>>>>> lxc-start 1395671045.703 ERROR    lxc_monitor - connect : backing
> >>>>>> off 10 lxc-start 1395671045.713 ERROR    lxc_monitor - connect :
> >>>>>> backing off 50 lxc-start 1395671045.763 ERROR    lxc_monitor -
> >>>>>> connect : backing off 100 lxc-start 1395671045.864 ERROR
> >>>>>> lxc_monitor - connect : Connection refused
> >>>>>>
> >>>>>>
> >>>>>> ... and the command fails.
> >>>>>
> >>>>> The only time I've seen this happen is if lxc-monitord is hard
> >>>>> killed so it doesn't have a chance to clean up and remove the
> >>>>> socket.
> >>>>
> >>>> Here, it's happening quite frequently. However, the script never
> >>>> kills lxc-monitord on its own, it just tries to detect and fix
> >>>> this state by removing the socket file...
> >>>
> >>> Right, removing the socket file makes it so another lxc-monitord
> >>> will start, but the question is why is the first one exiting without
> >>> cleaning up? Can you reliably reproduce it at will? If so then maybe
> >>> you could attach an strace to lxc-monitord and see why it is
> >>> exiting.
> >>
> >> I was so far not successful in reproducing the bug while having an
> >> strace running. :-( But I'll continue to try!
>
> Success :-) I managed to get an strace while trying to reproduce the
> bug. I gzipped and attached it to this mail.
>
> Its the output of strace -f -s 200 /usr/lib/lxc/lxc-monitord
> /var/lib/lxc /run/lxc/var/lib/lxc/monitor-fifo &> strace_output.txt
>
> I fired a bunch of lxc-starts and lxc-stops in row, then stopped my
> script and waited for lxc-monitord (and strace too) to stop.
>
> Then I started my script again and had the "leftover monitor-fifo state".
>
> >>>
> >>>>>
> >>>>>>
> >>>>>> A possible workaround would be checking for non-running
> >>>>>> lxc-monitord process but existing monitor-fifo file then removing
> >>>>>> the fifo if it exists before running the next lxc command, but
> >>>>>> thats ugly ;-)
> >>>>>
> >>>>> Is there a good non-racy way to do this? I guess monitord could
> >>>>> write its pid in $LXCPATH and we could kill(pid, 0) it.
> >>
> >> I also think that lxc should be able to recover from this problem
> >> automatically.
> >
> > I agree, though I would like to understand the root cause. Can you try
> > out the attached patch? I think it will cure your issues.
> >
>
> Thanks for the patch! Just tell me if you need more information for the
> strace above. If not, I'll happily apply the patch :-)
>
> >>>>>
> >>>>>> Is this behaviour known? Is there some missing "cleanup code" in
> >>>>>> lxc(_monitord) or why is it failing like this?
> >>>>>
> >>>>> Currently it catches SIGILL, SIGSEGV, SIGBUS, and SIGTERM and
> >>>>> cleans up. Other than hard kill I'm not sure what else might
> >>>>> cause it to exit without cleaning up.
> >>>>
> >>>> I shutdown containers with `lxc-stop -n container-name`
> >>>> (lxc.stopsignal=30 (SIGPWR)), however this signal should never go
> >>>> to lxc_monitord, right?
> >>>
> >>> Right, that goes to the init process of the container.
>
>
> _______________________________________________
> lxc-users mailing list
> [email protected]
> http://lists.linuxcontainers.org/listinfo/lxc-users
>

_______________________________________________
lxc-users mailing list
[email protected]
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] lxc-users Digest, Vol 16, Issue 2

Reply via email to