> On Sept. 21, 2012, 3:23 a.m., Jie Yu wrote: > > Ben, I am just curious whether you have observed a case in which a retry is > > useful? > > > > From my experience, if a cgroup stucks at FREEZING state (e.g. some process > > is in T or Z state), writing FROZEN to retry never brings the state to > > FROZEN. > > > > If you do see a case that a retry is useful, let me know. > > Benjamin Hindman wrote: > We've actually seen cases in which a process in the cgroup is still in R! > It's possible that at the time the kernel could not freeze that process for > whatever reason, and so retrying seems to be the only option (although, I > hope that it's not the case that the process can never be frozen, which would > seem like a pretty serious design issue). > > Jie Yu wrote: > > We've actually seen cases in which a process in the cgroup is still in > R! > > Maybe this is a kernel bug (race condition?) ;) from my understanding of > the kernel code, this seems to be impossible... > > You can take a look at "kernel/cgroup_freezer.c" > > Probably you can start with the function "freezer_write(...)" > > Benjamin Hindman wrote: > Hmm, so is the documentation out of date? The documentation makes me > think that partially frozen cgroups are indeed possible and expected, and the > user might need to try and freeze a cgroup multiple times (I attached the > relevant snippet from the documentation in the review summary above). > > Jie Yu wrote: > No, I am not saying that the doc is out-of-date. What I am trying to > understand is why a process in "R" state cannot be frozen. > > I will take a look at the kernel code that you use, and let you know the > possible explanation. > > Benjamin Hindman wrote: > Sounds great, thanks! In the mean time, I'll commit this change and see > if it fixes the issue.
I ran 50 tasks, each that forked off 20 processes (where each process technically forked ~4 subprocesses.) The memory limit for the tasks was about 10% too low for start-up, but just about right for the steady-state, which resulted in non-deterministic OOMing of tasks. Eventually all of them scheduled and were running fine, but first taking about ~200 tasks getting OOMed first. So we had a big sample set of OOM kills. Of the ~200, 3 got stuck into this state. The freezer froze _right_ in the middle of those 80 forks, and the cgroup was left in FREEZING state with only one process in R and the rest in D/Ds. - Brian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7203/#review11766 ----------------------------------------------------------- On Sept. 21, 2012, 2:02 a.m., Benjamin Hindman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/7203/ > ----------------------------------------------------------- > > (Updated Sept. 21, 2012, 2:02 a.m.) > > > Review request for mesos, Vinod Kone, Brian Wickman, and Jie Yu. > > > Description > ------- > > See summary and > http://www.kernel.org/doc/Documentation/cgroups/freezer-subsystem.txt: > > It's important to note that freezing can be incomplete. In that case we return > EBUSY. This means that some tasks in the cgroup are busy doing something that > prevents us from completely freezing the cgroup at this time. After EBUSY, > the cgroup will remain partially frozen -- reflected by freezer.state > reporting > "FREEZING" when read. The state will remain "FREEZING" until one of these > things happens: > > 1) Userspace cancels the freezing operation by writing "THAWED" to > the freezer.state file > 2) Userspace retries the freezing operation by writing "FROZEN" to > the freezer.state file (writing "FREEZING" is not legal > and returns EINVAL) > 3) The tasks that blocked the cgroup from entering the "FROZEN" > state disappear from the cgroup's set of tasks. > > > Diffs > ----- > > src/linux/cgroups.cpp 4efd06e > > Diff: https://reviews.apache.org/r/7203/diff/ > > > Testing > ------- > > > Thanks, > > Benjamin Hindman > >
