Andreas Chalupa created MESOS-3730:
--------------------------------------
Summary: Docker containers wont start on a set of mesos slaves
Key: MESOS-3730
URL: https://issues.apache.org/jira/browse/MESOS-3730
Project: Mesos
Issue Type: Bug
Components: slave
Affects Versions: 0.25.0
Environment: CentOS 7
Reporter: Andreas Chalupa
We have 3 nodes that we've designated to run 'data' containers. These are
stateful containers that share a volume with the slave host machine. We've
seen on two different test beds now that these slaves get into a state where
they can't start any containers. The STDERROR of the containers show this
error:
mesos-docker-executor:
/tmp/mesos-build/mesos-repo/3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:110:
T& Option<T>::get() [with T = std::basic_string<char>]: Assertion `isSome()'
failed.
*** Aborted at 1444832114 (unix time) try "date -d @1444832114" if you are
using GNU date ***
PC: @ 0x7fc02694a5d7 __GI_raise
*** SIGABRT (@0x4a7b) received by PID 19067 (TID 0x7fc02913b8c0) from PID
19067; stack trace: ***
@ 0x7fc027504130 (unknown)
@ 0x7fc02694a5d7 __GI_raise
@ 0x7fc02694bcc8 __GI_abort
@ 0x7fc026943546 __assert_fail_base
@ 0x7fc0269435f2 __GI___assert_fail
@ 0x4166b2 Option<>::get()
@ 0x417725 main
@ 0x7fc026936af5 __libc_start_main
@ 0x417875 (unknown)
I have no idea how to interpret this error and what it might mean.
Once the slave is in this state it is busted and no containers can be reloaded.
I don't think it clears up until we reboot the host machine (maybe a restart
of the slave or docker might be enough?)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)