[ 
https://issues.apache.org/jira/browse/MESOS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628724#comment-14628724
 ] 

Michael Park commented on MESOS-3056:
-------------------------------------

One thought is that perhaps {{mutex}} is in an invalid state due to 
{{pthread_mutex_init}} failure which we don't check for.

{noformat}
pthread_mutex_init() will fail if:

[EAGAIN]  The system temporarily lacks the resources to create another mutex.

[EINVAL]  The value specified by attr is invalid.

[ENOMEM]  The process cannot allocate enough memory to create another mutex.
{noformat}

It can't be {{EINVAL}} since we pass {{NULL}} which is explicitly mentioned to 
be a valid argument for {{pthread_mutex_init}}. It could be {{EAGAIN}} or 
{{ENOMEM}} but those are rarer events. But [~jieyu] mentioned in #mesos IRC 
that the problem is sporadic so it's not completely out of the realm of 
possibilities.

> Slave segfault related to Synchronized
> --------------------------------------
>
>                 Key: MESOS-3056
>                 URL: https://issues.apache.org/jira/browse/MESOS-3056
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.24.0
>            Reporter: Jie Yu
>
> Here is the backtrace on the coredump:
> Environment:
> CentOS 5.11
> devtoolset-2 (gcc-4.8.2)
> {noformat}
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f0ba6b78dd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x00007f0ba6b78dd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #1  0x00007f0ba7bcd211 in operator() (arg=Unhandled dwarf expression opcode 
> 0xf3
> ) at ./3rdparty/stout/include/stout/synchronized.hpp:84
> #2  _FUN (arg=Unhandled dwarf expression opcode 0xf3
> ) at ./3rdparty/stout/include/stout/synchronized.hpp:85
> #3  Synchronized (arg=Unhandled dwarf expression opcode 0xf3
> ) at ./3rdparty/stout/include/stout/synchronized.hpp:34
> #4  synchronize (arg=Unhandled dwarf expression opcode 0xf3
> ) at ./3rdparty/stout/include/stout/synchronized.hpp:89
> #5  approach (arg=Unhandled dwarf expression opcode 0xf3
> ) at src/gate.hpp:65
> #6  process::schedule (arg=Unhandled dwarf expression opcode 0xf3
> ) at src/process.cpp:614
> #7  0x00007f0ba6b7683d in start_thread () from /lib64/libpthread.so.0
> #8  0x00007f0ba6368fcd in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to