I think the best we can do here is _try_ and send a SIGKILL to all processes (in R or T or S or whatever) after we write FROZEN to freezer.state and we find out everything is still in FREEZING. We'll continue to write FROZEN to freezer.state after the interval AND we'll continue to try sending SIGKILL. Hopefully these two mechanisms will _eventually_ get everything cleaned up.
How does that sound? On Sat, Sep 22, 2012 at 10:09 AM, Jie Yu <[email protected]> wrote: > Also, I don't understand what you mean here? Could you elaborate? > > > If you have two running process to kill, you cannot send a SIGKILL to them > atomically. As a result, one proces will be killed first (likely), and the > other process is still making progress (though in a very short interval). > That may cause unpredictable errors. > > - Jie > > On Sat, Sep 22, 2012 at 2:03 AM, Vinod Kone <[email protected]> wrote: > >> Thanks for digging up the kernel code Jie! Its fascinating. >> >> >>> Will that cause potential problems if there are more than 1 process in >>> 'R' because the kill is not atomic. >>> >>> >> Also, I don't understand what you mean here? Could you elaborate? >> >> >> Vinod >> >> >>> - Jie >>> >>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <[email protected]>wrote: >>> >>>> >>>> >>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote: >>>> > > lgtm. i've a feeling we need to also do a force kill. but we can do >>>> this after we see how brian's test pans out. >>>> >>>> I tried just setting FREEZING to the cgroup freezer.state manually and >>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in the >>>> cgroup still in R, and that got everything to cleanup. So I expect that >>>> you're correct, and we'll also need to send explicit SIGKILLs to those >>>> processes still in R (in fact, probably just to all processes still in the >>>> cgroup). Review incoming. >>>> >>>> >>>> - Benjamin >>>> >>>> >>>> ----------------------------------------------------------- >>>> >>>> This is an automatically generated e-mail. To reply, visit: >>>> https://reviews.apache.org/r/7203/#review11794 >>>> >>>> ----------------------------------------------------------- >>>> >>>> >>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote: >>>> > >>>> > ----------------------------------------------------------- >>>> > This is an automatically generated e-mail. To reply, visit: >>>> > https://reviews.apache.org/r/7203/ >>>> > ----------------------------------------------------------- >>>> > >>>> > (Updated Sept. 21, 2012, 2:02 a.m.) >>>> >>>> > >>>> > >>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu. >>>> > >>>> > >>>> > Description >>>> > ------- >>>> >>>> > >>>> > See summary and >>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt: >>>> > >>>> > It's important to note that freezing can be incomplete. In that case >>>> we return >>>> > EBUSY. This means that some tasks in the cgroup are busy doing >>>> something that >>>> > prevents us from completely freezing the cgroup at this time. After >>>> EBUSY, >>>> > the cgroup will remain partially frozen -- reflected by freezer.state >>>> reporting >>>> > "FREEZING" when read. The state will remain "FREEZING" until one of >>>> these >>>> > things happens: >>>> > >>>> > 1) Userspace cancels the freezing operation by writing "THAWED" >>>> to >>>> > the freezer.state file >>>> > 2) Userspace retries the freezing operation by writing "FROZEN" >>>> to >>>> > the freezer.state file (writing "FREEZING" is not legal >>>> > and returns EINVAL) >>>> > 3) The tasks that blocked the cgroup from entering the "FROZEN" >>>> > state disappear from the cgroup's set of tasks. >>>> > >>>> > >>>> > Diffs >>>> > ----- >>>> > >>>> > src/linux/cgroups.cpp 4efd06e >>>> > >>>> > Diff: https://reviews.apache.org/r/7203/diff/ >>>> > >>>> > >>>> > Testing >>>> > ------- >>>> > >>>> > >>>> > Thanks, >>>> > >>>> > Benjamin Hindman >>>> > >>>> > >>>> >>>> >>> >> >
