[
https://issues.apache.org/jira/browse/MESOS-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628724#comment-14628724
]
Michael Park commented on MESOS-3056:
-------------------------------------
One thought is that perhaps {{mutex}} is in an invalid state due to
{{pthread_mutex_init}} failure which we don't check for.
{noformat}
pthread_mutex_init() will fail if:
[EAGAIN] The system temporarily lacks the resources to create another mutex.
[EINVAL] The value specified by attr is invalid.
[ENOMEM] The process cannot allocate enough memory to create another mutex.
{noformat}
It can't be {{EINVAL}} since we pass {{NULL}} which is explicitly mentioned to
be a valid argument for {{pthread_mutex_init}}. It could be {{EAGAIN}} or
{{ENOMEM}} but those are rarer events. But [~jieyu] mentioned in #mesos IRC
that the problem is sporadic so it's not completely out of the realm of
possibilities.
> Slave segfault related to Synchronized
> --------------------------------------
>
> Key: MESOS-3056
> URL: https://issues.apache.org/jira/browse/MESOS-3056
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.24.0
> Reporter: Jie Yu
>
> Here is the backtrace on the coredump:
> Environment:
> CentOS 5.11
> devtoolset-2 (gcc-4.8.2)
> {noformat}
> Program terminated with signal 11, Segmentation fault.
> #0 0x00007f0ba6b78dd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> (gdb) bt
> #0 0x00007f0ba6b78dd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #1 0x00007f0ba7bcd211 in operator() (arg=Unhandled dwarf expression opcode
> 0xf3
> ) at ./3rdparty/stout/include/stout/synchronized.hpp:84
> #2 _FUN (arg=Unhandled dwarf expression opcode 0xf3
> ) at ./3rdparty/stout/include/stout/synchronized.hpp:85
> #3 Synchronized (arg=Unhandled dwarf expression opcode 0xf3
> ) at ./3rdparty/stout/include/stout/synchronized.hpp:34
> #4 synchronize (arg=Unhandled dwarf expression opcode 0xf3
> ) at ./3rdparty/stout/include/stout/synchronized.hpp:89
> #5 approach (arg=Unhandled dwarf expression opcode 0xf3
> ) at src/gate.hpp:65
> #6 process::schedule (arg=Unhandled dwarf expression opcode 0xf3
> ) at src/process.cpp:614
> #7 0x00007f0ba6b7683d in start_thread () from /lib64/libpthread.so.0
> #8 0x00007f0ba6368fcd in clone () from /lib64/libc.so.6
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)