Seems ok, I'm surprised a FROZEN loop doesn't work. It would be interesting to have some introspection on how many iterations this takes in practice, I guess this could be done with some unix-fu on the logs.
On Wed, Oct 3, 2012 at 9:31 AM, Benjamin Hindman <[email protected]> wrote: > I think the best we can do here is _try_ and send a SIGKILL to all > processes (in R or T or S or whatever) after we write FROZEN to > freezer.state and we find out everything is still in FREEZING. We'll > continue to write FROZEN to freezer.state after the interval AND we'll > continue to try sending SIGKILL. Hopefully these two mechanisms will > _eventually_ get everything cleaned up. > > How does that sound? > > > > > On Sat, Sep 22, 2012 at 10:09 AM, Jie Yu <[email protected]> wrote: > > > Also, I don't understand what you mean here? Could you elaborate? > > > > > > If you have two running process to kill, you cannot send a SIGKILL to > them > > atomically. As a result, one proces will be killed first (likely), and > the > > other process is still making progress (though in a very short interval). > > That may cause unpredictable errors. > > > > - Jie > > > > On Sat, Sep 22, 2012 at 2:03 AM, Vinod Kone <[email protected]> wrote: > > > >> Thanks for digging up the kernel code Jie! Its fascinating. > >> > >> > >>> Will that cause potential problems if there are more than 1 process in > >>> 'R' because the kill is not atomic. > >>> > >>> > >> Also, I don't understand what you mean here? Could you elaborate? > >> > >> > >> Vinod > >> > >> > >>> - Jie > >>> > >>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <[email protected] > >wrote: > >>> > >>>> > >>>> > >>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote: > >>>> > > lgtm. i've a feeling we need to also do a force kill. but we can > do > >>>> this after we see how brian's test pans out. > >>>> > >>>> I tried just setting FREEZING to the cgroup freezer.state manually and > >>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process > in the > >>>> cgroup still in R, and that got everything to cleanup. So I expect > that > >>>> you're correct, and we'll also need to send explicit SIGKILLs to those > >>>> processes still in R (in fact, probably just to all processes still > in the > >>>> cgroup). Review incoming. > >>>> > >>>> > >>>> - Benjamin > >>>> > >>>> > >>>> ----------------------------------------------------------- > >>>> > >>>> This is an automatically generated e-mail. To reply, visit: > >>>> https://reviews.apache.org/r/7203/#review11794 > >>>> > >>>> ----------------------------------------------------------- > >>>> > >>>> > >>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote: > >>>> > > >>>> > ----------------------------------------------------------- > >>>> > This is an automatically generated e-mail. To reply, visit: > >>>> > https://reviews.apache.org/r/7203/ > >>>> > ----------------------------------------------------------- > >>>> > > >>>> > (Updated Sept. 21, 2012, 2:02 a.m.) > >>>> > >>>> > > >>>> > > >>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu. > >>>> > > >>>> > > >>>> > Description > >>>> > ------- > >>>> > >>>> > > >>>> > See summary and > >>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt > : > >>>> > > >>>> > It's important to note that freezing can be incomplete. In that case > >>>> we return > >>>> > EBUSY. This means that some tasks in the cgroup are busy doing > >>>> something that > >>>> > prevents us from completely freezing the cgroup at this time. After > >>>> EBUSY, > >>>> > the cgroup will remain partially frozen -- reflected by > freezer.state > >>>> reporting > >>>> > "FREEZING" when read. The state will remain "FREEZING" until one of > >>>> these > >>>> > things happens: > >>>> > > >>>> > 1) Userspace cancels the freezing operation by writing > "THAWED" > >>>> to > >>>> > the freezer.state file > >>>> > 2) Userspace retries the freezing operation by writing > "FROZEN" > >>>> to > >>>> > the freezer.state file (writing "FREEZING" is not > legal > >>>> > and returns EINVAL) > >>>> > 3) The tasks that blocked the cgroup from entering the > "FROZEN" > >>>> > state disappear from the cgroup's set of tasks. > >>>> > > >>>> > > >>>> > Diffs > >>>> > ----- > >>>> > > >>>> > src/linux/cgroups.cpp 4efd06e > >>>> > > >>>> > Diff: https://reviews.apache.org/r/7203/diff/ > >>>> > > >>>> > > >>>> > Testing > >>>> > ------- > >>>> > > >>>> > > >>>> > Thanks, > >>>> > > >>>> > Benjamin Hindman > >>>> > > >>>> > > >>>> > >>>> > >>> > >> > > >
