[ 
https://issues.apache.org/jira/browse/MESOS-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749050#comment-13749050
 ] 

Tobias Weingartner commented on MESOS-662:
------------------------------------------

Note, the "problem" is that there was a rather global lock held by the process 
which ultimately was frozen by the kernel.  IE: global i-node lock held in 
ext3_dx_readdir(), and later on the same thread of execution get's frozen due 
to OOM.  At this point, any other process needing to acquire the lock held by 
the frozen thread ends up dog-piling up behind the frozen thread and deadlock 
results.

One possible solution is to mark threads that OOM in the kernel as having an 
OOM condition, allowing them to continue on their way (don't freeze/pre-empt 
them in kernel space), and upon return to user mode, freeze and pre-empt them 
at that point.
                
> Executor OOM could lead to a kernel hang
> ----------------------------------------
>
>                 Key: MESOS-662
>                 URL: https://issues.apache.org/jira/browse/MESOS-662
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Priority: Critical
>             Fix For: 0.15.0
>
>
> We observed this in production at Twitter.
> An executor OOMed and kernel put it in sleep instead of killing it because 
> Mesos slave disable OOM kills. Mesos disables the kernel OOM so that it can 
> take some action. The currently the only action it does is cleaning up the 
> cgroup. But in the future, the action could be to increase the memory limit.
> [6290807.554028] SysRq : Show Blocked State
> [6290807.554175]   task                        PC stack   pid father
> [6290807.554251] python2.6       D ffff88097b1c3158     0 31039      1 
> 0x00000000
> [6290807.554255]  ffff88120ae19b48 0000000000000082 0000000000000000 
> ffff88093ffffa08
> [6290807.554259]  ffff88093fffed00 ffff88120ae18010 0000000000013300 
> 0000000000013300
> [6290807.554263]  0000000000013300 ffff88120ae19fd8 0000000000013300 
> 0000000000013300
> [6290807.554267] Call Trace:
> [6290807.554279]  [<ffffffff814dfabd>] schedule+0x64/0x66
> [6290807.554285]  [<ffffffff8113ad09>] mem_cgroup_handle_oom+0x132/0x21f
> [6290807.554289]  [<ffffffff81138e62>] ? mem_cgroup_update_tree+0x165/0x165
> [6290807.554292]  [<ffffffff8113aef5>] mem_cgroup_do_charge+0xff/0x124
> [6290807.554295]  [<ffffffff8113b0ce>] __mem_cgroup_try_charge+0x1b4/0x298
> [6290807.554298]  [<ffffffff8113b643>] mem_cgroup_charge_common+0x6a/0x91
> [6290807.554301]  [<ffffffff8113b72f>] mem_cgroup_newpage_charge+0x23/0x25
> [6290807.554307]  [<ffffffff8110c26e>] do_anonymous_page+0x169/0x29a
> [6290807.554311]  [<ffffffff81110137>] handle_pte_fault+0x8d/0x1b1
> [6290807.554315]  [<ffffffff8110a793>] ? 
> anon_vma_interval_tree_insert+0x8a/0x8c
> [6290807.554319]  [<ffffffff81113afe>] ? vma_adjust+0x50f/0x5b9
> [6290807.554324]  [<ffffffff811a196d>] ? ext3_dx_readdir+0x181/0x1d7
> [6290807.554327]  [<ffffffff81110489>] handle_mm_fault+0x22e/0x248
> [6290807.554332]  [<ffffffff814e3c6a>] do_page_fault+0x367/0x3ae
> [6290807.554335]  [<ffffffff811149f4>] ? do_brk+0x291/0x2f2
> [6290807.554339]  [<ffffffff81141289>] ? __fput+0x1e7/0x1f6
> [6290807.554342]  [<ffffffff814e0ba5>] page_fault+0x25/0x30
> A short term solution is to enable kernel OOM kill in cgroups (until we get 
> around to adding support for soft memory limits in the cgroups isolator). The 
> slave should still get a OOM notification and properly inform the frameworks 
> of the OOM. One concern is that we don't know if kernel handling OOM would 
> cause problems with cgroups cleanup done by the slave. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to