Daniel Hall created MESOS-2186:
----------------------------------
Summary: Mesos crashes if any configured zookeeper does not
resolve.
Key: MESOS-2186
URL: https://issues.apache.org/jira/browse/MESOS-2186
Project: Mesos
Issue Type: Bug
Affects Versions: 0.21.0
Environment: Zookeeper: 3.4.5+28-1.cdh4.7.1.p0.13.el6
Mesos: 0.21.0-1.0.centos65
CentOS: CentOS release 6.6 (Final)
Reporter: Daniel Hall
Priority: Critical
When starting Mesos, if one of the configured zookeeper servers does not
resolve in DNS Mesos will crash and refuse to start. We noticed this issue
while we were rebuilding one of our zookeeper hosts in Google compute (which
bases the DNS on the machines running).
Here is a log from a failed startup (hostnames and ip addresses have been
sanitised).
{noformat}
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.088835 28627
main.cpp:292] Starting Mesos master
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09
22:54:54,095:28627(0x7fa9f042f700):ZOO_ERROR@getaddrs@599: getaddrinfo: No such
file or directory
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]:
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.095239 28642
zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such file or
directory [2]
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack
trace: ***
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09
22:54:54,097:28627(0x7fa9ed22a700):ZOO_ERROR@getaddrs@599: getaddrinfo: No such
file or directory
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]:
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 28647
zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such file or
directory [2]
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack
trace: ***
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160
google::LogMessage::Fail()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160
google::LogMessage::Fail()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9
google::LogMessage::SendToLog()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09
22:54:54,108:28627(0x7fa9ef02d700):ZOO_ERROR@getaddrs@599: getaddrinfo: No such
file or directory
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]:
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 28647
zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such file or
directory [2]F1209 22:54:54.108422 28644 zookeeper.cpp:113] Failed to create
ZooKeeper, zookeeper_init: No such file or directory [2]
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack
trace: ***
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160
google::LogMessage::Fail()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09
22:54:54,109:28627(0x7fa9f0e30700):ZOO_ERROR@getaddrs@599: getaddrinfo: No such
file or directory
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]:
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 28647
zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such file or
directory [2]F1209 22:54:54.108422 28644 zookeeper.cpp:113] Failed to create
ZooKeeper, zookeeper_init: No such file or directory [2]F1209 22:54:54.109864
28641 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such
file or directory [2]
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack
trace: ***
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160
google::LogMessage::Fail()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9
google::LogMessage::SendToLog()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9
google::LogMessage::SendToLog()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.123208 28640
master.cpp:318] Master 20141209-225454-4155764746-5050-28627
(mesosmaster-2.internal) started on 10.x.x.x:5050
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.123306 28640
master.cpp:366] Master allowing unauthenticated frameworks to register
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.123327 28640
master.cpp:371] Master allowing unauthenticated slaves to register
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9
google::LogMessage::SendToLog()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f569fa97
google::LogMessage::Flush()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f569fa97
google::LogMessage::Flush()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f569fa97
google::LogMessage::Flush()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f569f8af
google::LogMessage::~LogMessage()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a086f
google::ErrnoLogMessage::~ErrnoLogMessage()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f569fa97
google::LogMessage::Flush()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.159488 28643
contender.cpp:131] Joining the ZK group
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.160753 28640
master.cpp:1202] Successfully attached file '/var/log/mesos/mesos-master.INFO'
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f569f8af
google::LogMessage::~LogMessage()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a086f
google::ErrnoLogMessage::~ErrnoLogMessage()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f569f8af
google::LogMessage::~LogMessage()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a086f
google::ErrnoLogMessage::~ErrnoLogMessage()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f569f8af
google::LogMessage::~LogMessage()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a086f
google::ErrnoLogMessage::~ErrnoLogMessage()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f5201abf
ZooKeeperProcess::initialize()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f5604367
process::ProcessManager::resume()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f5201abf
ZooKeeperProcess::initialize()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f5201abf
ZooKeeperProcess::initialize()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f5201abf
ZooKeeperProcess::initialize()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f5604367
process::ProcessManager::resume()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f5604367
process::ProcessManager::resume()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f5604367
process::ProcessManager::resume()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f55fa21f
process::schedule()
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x3e498079d1
(unknown)
Dec 9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x3e494e89dd
(unknown)
Dec 9 22:54:54 mesosmaster-2 abrt[28650]: Not saving repeating crash in
'/usr/local/sbin/mesos-master'
Dec 9 22:54:54 mesosmaster-2 init: mesos-master main process (28627) killed by
ABRT signal
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)