sgtm

@vinodkone


On Wed, Oct 3, 2012 at 9:31 AM, Benjamin Hindman <[email protected]> wrote:

> I think the best we can do here is _try_ and send a SIGKILL to all
> processes (in R or T or S or whatever) after we write FROZEN to
> freezer.state and we find out everything is still in FREEZING. We'll
> continue to write FROZEN to freezer.state after the interval AND we'll
> continue to try sending SIGKILL. Hopefully these two mechanisms will
> _eventually_ get everything cleaned up.
>
> How does that sound?
>
>
>
>
>
> On Sat, Sep 22, 2012 at 10:09 AM, Jie Yu <[email protected]> wrote:
>
>> Also, I don't understand what you mean here? Could you elaborate?
>>
>>
>> If you have two running process to kill, you cannot send a SIGKILL to
>> them atomically. As a result, one proces will be killed first (likely), and
>> the other process is still making progress (though in a very short
>> interval). That may cause unpredictable errors.
>>
>> - Jie
>>
>> On Sat, Sep 22, 2012 at 2:03 AM, Vinod Kone <[email protected]> wrote:
>>
>>> Thanks for digging up the kernel code Jie! Its fascinating.
>>>
>>>
>>>> Will that cause potential problems if there are more than 1 process in
>>>> 'R' because the kill is not atomic.
>>>>
>>>>
>>> Also, I don't understand what you mean here? Could you elaborate?
>>>
>>>
>>> Vinod
>>>
>>>
>>>> - Jie
>>>>
>>>> On Fri, Sep 21, 2012 at 9:10 PM, Benjamin Hindman <[email protected]>wrote:
>>>>
>>>>>
>>>>>
>>>>> > On Sept. 21, 2012, 7 p.m., Vinod Kone wrote:
>>>>> > > lgtm. i've a feeling we need to also do a force kill. but we can
>>>>> do this after we see how brian's test pans out.
>>>>>
>>>>> I tried just setting FREEZING to the cgroup freezer.state manually and
>>>>> that didn't seem to work. Meanwhile, I sent a SIGKILL to the process in 
>>>>> the
>>>>> cgroup still in R, and that got everything to cleanup. So I expect that
>>>>> you're correct, and we'll also need to send explicit SIGKILLs to those
>>>>> processes still in R (in fact, probably just to all processes still in the
>>>>> cgroup). Review incoming.
>>>>>
>>>>>
>>>>> - Benjamin
>>>>>
>>>>>
>>>>> -----------------------------------------------------------
>>>>>
>>>>> This is an automatically generated e-mail. To reply, visit:
>>>>> https://reviews.apache.org/r/7203/#review11794
>>>>>
>>>>> -----------------------------------------------------------
>>>>>
>>>>>
>>>>> On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote:
>>>>> >
>>>>> > -----------------------------------------------------------
>>>>> > This is an automatically generated e-mail. To reply, visit:
>>>>> > https://reviews.apache.org/r/7203/
>>>>> > -----------------------------------------------------------
>>>>> >
>>>>> > (Updated Sept. 21, 2012, 2:02 a.m.)
>>>>>
>>>>> >
>>>>> >
>>>>> > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu.
>>>>> >
>>>>> >
>>>>> > Description
>>>>> > -------
>>>>>
>>>>> >
>>>>> > See summary and
>>>>> http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt:
>>>>> >
>>>>> > It's important to note that freezing can be incomplete. In that case
>>>>> we return
>>>>> > EBUSY. This means that some tasks in the cgroup are busy doing
>>>>> something that
>>>>> > prevents us from completely freezing the cgroup at this time. After
>>>>> EBUSY,
>>>>> > the cgroup will remain partially frozen -- reflected by
>>>>> freezer.state reporting
>>>>> > "FREEZING" when read. The state will remain "FREEZING" until one of
>>>>> these
>>>>> > things happens:
>>>>> >
>>>>> >       1) Userspace cancels the freezing operation by writing
>>>>> "THAWED" to
>>>>> >               the freezer.state file
>>>>> >       2) Userspace retries the freezing operation by writing
>>>>> "FROZEN" to
>>>>> >               the freezer.state file (writing "FREEZING" is not legal
>>>>> >               and returns EINVAL)
>>>>> >       3) The tasks that blocked the cgroup from entering the "FROZEN"
>>>>> >               state disappear from the cgroup's set of tasks.
>>>>> >
>>>>> >
>>>>> > Diffs
>>>>> > -----
>>>>> >
>>>>> >   src/linux/cgroups.cpp 4efd06e
>>>>> >
>>>>> > Diff: https://reviews.apache.org/r/7203/diff/
>>>>> >
>>>>> >
>>>>> > Testing
>>>>> > -------
>>>>> >
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > Benjamin Hindman
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to