[
https://issues.apache.org/jira/browse/MESOS-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004993#comment-14004993
]
Benjamin Mahler commented on MESOS-1326:
----------------------------------------
On the other hand, there are indeed some cases where the slave is able to
restart within 10 seconds:
{noformat}
W0521 10:02:13.549584 54783 slave.cpp:425] Ignoring shutdown message from
[email protected]:5050 because it is not from the registered master: None
2014-05-21 10:02:14,973:54764(0x7fb89cfc6940):ZOO_ERROR@getaddrs@599:
getaddrinfo: Invalid argument
F0521 10:02:14.973850 54772 zookeeper.cpp:74] Failed to create ZooKeeper,
zookeeper_init: Invalid argument [22]
*** Check failure stack trace: ***
@ 0x7fb8a48125fd google::LogMessage::Fail()
@ 0x7fb8a4814444 google::LogMessage::SendToLog()
@ 0x7fb8a48121ec google::LogMessage::Flush()
@ 0x7fb8a48123f9 google::LogMessage::~LogMessage()
@ 0x7fb8a4813372 google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x7fb8a455d561 ZooKeeper::ZooKeeper()
@ 0x7fb8a4567f38 zookeeper::GroupProcess::expired()
@ 0x7fb8a4568198 zookeeper::GroupProcess::timedout()
@ 0x7fb8a47481c2 process::ProcessManager::resume()
@ 0x7fb8a47484bc process::schedule()
@ 0x7fb8a3cbc83d start_thread
@ 0x7fb8a2a2426d clone
/usr/local/bin/mesos-slave.sh: line 115: 54764 Aborted (core
dumped) $debug /usr/local/sbin/mesos-slave --port=5051
--resources="${MESOS_RESOURCES}" --attributes="${MESOS_ATTR
IBUTES}" --master="${master_zoo_url}" --log_dir="${log_dir}" ${EXTRA_FLAGS} "$@"
Slave Exit Status: 134
I0521 10:02:22.104622 44885 logging.cpp:106] Logging INFO level started!
I0521 10:02:22.105105 44885 main.cpp:126] Build: 2014-04-24 19:52:05 by
mockbuild
I0521 10:02:22.105131 44885 main.cpp:128] Version: 0.19.0-tw3
W0521 10:02:22.105160 44885 containerizer.cpp:169] The 'cgroups' isolation flag
is deprecated, please update your flags to
'--isolation=cgroups/cpu,cgroups/mem'.
I0521 10:02:22.105423 44885 containerizer.cpp:177] Using isolation:
cgroups/cpu,cgroups/mem
I0521 10:02:22.128782 44885 cgroups_launcher.cpp:58] Using
/sys/fs/cgroup/freezer as the freezer hierarchy for the cgroups launcher
2014-05-21 10:02:22,129:44885(0x7f6d93c62940):ZOO_INFO@log_env@712: Client
environment:zookeeper.version=zookeeper C client 3.4.5
I0521 10:02:22.129151 44885 main.cpp:149] Starting Mesos slave
{noformat}
> Retry policy for zookeeper_init failures
> ----------------------------------------
>
> Key: MESOS-1326
> URL: https://issues.apache.org/jira/browse/MESOS-1326
> Project: Mesos
> Issue Type: Improvement
> Affects Versions: 0.19.0
> Reporter: Jie Yu
> Labels: reliability
>
> Currently, we fatal when we have a zookeeper_init failure. Sometimes, this is
> annoying because during a DNS failover, we may experience this a lot and we
> don't necessary need to fatal on those cases.
> I am wondering whether we can retry on zookeeper_init failures?
--
This message was sent by Atlassian JIRA
(v6.2#6252)