Re: OOM killer stuck when freezing cgroup

Brenden Matthews Fri, 10 May 2013 13:14:26 -0700

So after cat'ing around in /proc, I can see that the JVM is blocking on
disk I/O apparently:


root@i-bde90adc:~# cat /proc/24689/status
> Name: java
> State: D (disk sleep)
> Tgid: 24689
> Pid: 24689
> PPid: 24629
> TracerPid: 0
> Uid: 997 997 997 997
> Gid: 997 997 997 997
> FDSize: 512
> Groups:
> VmPeak: 2198700 kB
> VmSize: 2196016 kB
> VmLck:       0 kB
> VmPin:       0 kB
> VmHWM:  305276 kB
> VmRSS:  296828 kB
> VmData: 2040128 kB
> VmStk:     156 kB
> VmExe:       4 kB
> VmLib:   32692 kB
> VmPTE:    1292 kB
> VmSwap:       0 kB
> Threads: 104
> SigQ: 0/180427
> SigPnd: 0000000000000000
> ShdPnd: 0000000000000000
> SigBlk: 0000000000000000
> SigIgn: 0000000000000000
> SigCgt: 2000000181005ccf
> CapInh: 0000000000000000
> CapPrm: 0000000000000000
> CapEff: 0000000000000000
> CapBnd: ffffffffffffffff
> Cpus_allowed: ffff
> Cpus_allowed_list: 0-15
> Mems_allowed: 00000000,00000001
> Mems_allowed_list: 0
> voluntary_ctxt_switches: 25
> nonvoluntary_ctxt_switches: 0



Mesos looks okay:

root@i-bde90adc:~# cat /proc/3684/status
> Name: lt-mesos-slave
> State: S (sleeping)
> Tgid: 3684
> Pid: 3684
> PPid: 1131
> TracerPid: 0
> Uid: 0 0 0 0
> Gid: 0 0 0 0
> FDSize: 256
> Groups:
> VmPeak: 1531868 kB
> VmSize: 1531864 kB
> VmLck:       0 kB
> VmPin:       0 kB
> VmHWM:   15272 kB
> VmRSS:   15272 kB
> VmData: 1402012 kB
> VmStk:     136 kB
> VmExe:     152 kB
> VmLib:   31824 kB
> VmPTE:     416 kB
> VmSwap:       0 kB
> Threads: 20
> SigQ: 1/180427
> SigPnd: 0000000000000000
> ShdPnd: 0000000000000000
> SigBlk: 0000000000000000
> SigIgn: 0000000000000000
> SigCgt: 0000000180000000
> CapInh: 0000000000000000
> CapPrm: ffffffffffffffff
> CapEff: ffffffffffffffff
> CapBnd: ffffffffffffffff
> Cpus_allowed: ffff
> Cpus_allowed_list: 0-15
> Mems_allowed: 00000000,00000001
> Mems_allowed_list: 0
> voluntary_ctxt_switches: 69
> nonvoluntary_ctxt_switches: 0


Is it possible that changing the state of the freezer from thawed -> frozen
causes all disk I/O to stop?  I'm not sure where to look to mitigate this.


On Fri, May 10, 2013 at 11:58 AM, Brenden Matthews <
[email protected]> wrote:

> The kernel gives me a bunch of "soft lockup: CPU stuck!" and 'task java
> blocked for more than 120 seconds".  I think these are fairly generic
> errors and can be caused by any sort of race condition on the system.
>  Strace also isn't helpful, as it doesn't tell me which resource the
> processes are stuck on.
>
> I'll try and check /proc/pid/status next time it happens.  Other tools
> like lsof are unable to run.
>
>
> On Fri, May 10, 2013 at 11:49 AM, Vinod Kone <[email protected]> wrote:
>
>> ---------- Forwarded message ----------
>> From: Vinod Kone <[email protected]>
>> Date: Fri, May 10, 2013 at 11:25 AM
>> Subject: Re: OOM killer stuck when freezing cgroup
>> To: "[email protected]" <[email protected]>
>>
>>
>> Anything in the kernel logs during this time?
>>
>>
>> @vinodkone
>>
>>
>> On Fri, May 10, 2013 at 11:13 AM, Benjamin Mahler <
>> [email protected]
>> > wrote:
>>
>> > Sorry about that miscommunication, I meant to say "from the kernel OOM
>> > killer", and you're right that the OOM shouldn't be the source of the
>> > problem here (just trying to make a mental note that our accounting is
>> > wrong), OOMs are indeed expected as part of normal operation on the
>> slave.
>> >
>> > It's tough to say what the issue is as I don't have any experience
>> running
>> > on newer kernels like 3.2, I would recommend augmenting the logging in
>> the
>> > cgroups::Freezer to try to nail down at which point the slave is
>> blocking.
>> > In particular, in Freezer::freeze() and Freezer::watchFrozen().
>> >
>> > Also, do you have /proc/<slave_pid>/status output for the slave when
>> it's
>> > frozen?
>> >
>> > It will be really useful to know if that is consistently the last log
>> line
>> > you see before things lock up as well, as that helps narrow this down.
>> >
>> >
>> > On Fri, May 10, 2013 at 10:44 AM, Brenden Matthews <
>> > [email protected]> wrote:
>> >
>> > > This isn't a kernel OOM.  I started the debugger and let it go
>> overnight
>> > > with no luck.  It just doesn't want to budge.  I'm aware of the memory
>> > > limit issue, but I don't think that's the real problem here.
>>  Regardless
>> > of
>> > > whether we run out of memory, the whole system shouldn't freeze when
>> > trying
>> > > to kill the process.  I'm not sure if this is a kernel problem or a
>> Mesos
>> > > problem, but I suspect that the issue isn't with the kernel.  Walking
>> > > through the Mesos code, I can't see anything that is obviously wrong
>> > > either.  I'm using my own branch from here:
>> > > https://github.com/airbnb/mesos/tree/testing.
>> > >
>> > > My next step will be to run a bunch of slaves in GDB and then launch
>> some
>> > > tasks.  Hopefully I can stop them and get a backtrace.  I thought it
>> > might
>> > > be worth asking about it here to see if anyone else has dealt with
>> this
>> > and
>> > > knows what's going on.
>> > >
>> > > These are EC2 instances running the latest Ubuntu Precise AMIs.
>>  Here's
>> > the
>> > > `uname -a' from one:
>> > >
>> > > Linux i-b5dbc1d5 3.2.0-41-virtual #66-Ubuntu SMP Thu Apr 25 03:47:17
>> UTC
>> > > 2013 x86_64 x86_64 x86_64 GNU/Linux
>> > >
>> > >
>> > >
>> > >
>> > > On Fri, May 10, 2013 at 10:21 AM, Benjamin Mahler <
>> > > [email protected]
>> > > > wrote:
>> > >
>> > > > Hey Brenden, what kernel version are you running? What version of
>> the
>> > > code
>> > > > are you running? Are you seeing this on different machines? Can you
>> > > confirm
>> > > > you're unable to attach to the slave with gdb (that would be the
>> > easiest
>> > > > way to figure out what's blocking in the slave)? Is that always the
>> > last
>> > > > line in the log before the processes get stuck?
>> > > >
>> > > > As an aside, it looks like we aren't properly setting the JVM heap
>> size
>> > > for
>> > > > TaskTrackers in the MesosScheduler for Hadoop. We try to estimate
>> the
>> > JVM
>> > > > memory overhead, but it was perhaps not enough to avoid kernel OOMs.
>> > > >
>> > > >
>> > > > On Fri, May 10, 2013 at 9:57 AM, Brenden Matthews <
>> > > > [email protected]> wrote:
>> > > >
>> > > > > Hey folks,
>> > > > >
>> > > > > I'm bumping in to a problem frequently.  When an executor exceeds
>> its
>> > > > > memory limit and the executor is deemed in need of execution, the
>> > > entire
>> > > > > system becomes stuck (though I'm unsure as to whether it's stuck
>> > before
>> > > > or
>> > > > > after attempting to freeze).  Here's what the mesos slave log
>> looks
>> > > like:
>> > > > >
>> > > > > I0510 04:22:37.935608 13622 cgroups_isolator.cpp:1023] OOM
>> notifier
>> > is
>> > > > > > triggered for executor executor_Task_Tracker_446 of framework
>> > > > > > 201305100055-1471680778-5050-18199-0000 with uuid
>> > > > > > 14a21d5a-a670-40fd-91b1-54c42b9a30e0
>> > > > > > I0510 04:22:37.935925 13622 cgroups_isolator.cpp:1068] OOM
>> detected
>> > > for
>> > > > > > executor executor_Task_Tracker_446 of framework
>> > > > > > 201305100055-1471680778-5050-18199-0000 with uuid
>> > > > > > 14a21d5a-a670-40fd-91b1-54c42b9a30e0
>> > > > > > I0510 04:22:37.936982 13622 cgroups_isolator.cpp:1109] Memory
>> limit
>> > > > > > exceeded: Requested: 2570MB Used: 2570MB
>> > > > > > MEMORY STATISTICS:
>> > > > > > cache 1682083840
>> > > > > > rss 1012756480
>> > > > > > mapped_file 1085440
>> > > > > > pgpgin 22863635
>> > > > > > pgpgout 22205715
>> > > > > > swap 0
>> > > > > > pgfault 11881912
>> > > > > > pgmajfault 304
>> > > > > > inactive_anon 0
>> > > > > > active_anon 1012756480
>> > > > > > inactive_file 1065136128
>> > > > > > active_file 616947712
>> > > > > > unevictable 0
>> > > > > > hierarchical_memory_limit 2694840320
>> > > > > > hierarchical_memsw_limit 9223372036854775807
>> > > > > > total_cache 1682083840
>> > > > > > total_rss 1012756480
>> > > > > > total_mapped_file 1085440
>> > > > > > total_pgpgin 22863635
>> > > > > > total_pgpgout 22205715
>> > > > > > total_swap 0
>> > > > > > total_pgfault 11881912
>> > > > > > total_pgmajfault 304
>> > > > > > total_inactive_anon 0
>> > > > > > total_active_anon 1012756480
>> > > > > > total_inactive_file 1065136128
>> > > > > > total_active_file 616947712
>> > > > > > total_unevictable 0
>> > > > > > I0510 04:22:37.965666 13622 cgroups_isolator.cpp:620] Killing
>> > > executor
>> > > > > > executor_Task_Tracker_446 of framework
>> > > > > > 201305100055-1471680778-5050-18199-0000
>> > > > > > I0510 04:22:37.967418 13621 cgroups.cpp:1175] Trying to freeze
>> > cgroup
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> /cgroup/mesos/framework_201305100055-1471680778-5050-18199-0000_executor_executor_Task_Tracker_446_tag_14a21d5a-a670-40fd-91b1-54c42b9a30e0
>> > > > >
>> > > > >
>> > > > > After the last line in the log, both the executing task and the
>> mesos
>> > > > > process are stuck indefinitely.  Usually the way to resolve this
>> is
>> > > > > `killall -9 java lt-mesos-slave' or a reboot.
>> > > > >
>> > > > > If I cat /cgroup/mesos/framework.../freezer.state, it shows that
>> it's
>> > > > > freezing.  Since the entire system is stuck, I cannot attach a
>> > debugger
>> > > > to
>> > > > > either the JVM or mesos.
>> > > > >
>> > > > > Any ideas?  This problem is very frustrating and I'm stuck at this
>> > > point.
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: OOM killer stuck when freezing cgroup

Reply via email to