[ 
https://issues.apache.org/jira/browse/MESOS-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6205:
-----------------------------
    Description: 
I follow this 
[doc][https://open.mesosphere.com/getting-started/install/#verifying-installation]
 to setup mesos cluster.

There are three vm(ubuntu 12, centos 6.5, centos 7.2).
{code}
    $ cat /etc/hosts
    10.142.55.190 zk1
    10.142.55.196 zk2
    10.142.55.202 zk3
{code}
config in each mathine:
{code}
    $ cat /etc/mesos/zk
    zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos
{code}
----------------------------
After start zookeeper, mesos-master and mesos-slave in three vm, I can view the 
mesos webui(10.142.55.190:5050), but agents count is 0.
After a little time, mesos page get error:
{code}
    Failed to connect to 10.142.55.190:5050!
    Retrying in 16 seconds... 
{code}
(I found that zookeeper would elect a new leader in a short interval)

----------------------------------------
mesos-master cmd:
{code}
mesos-master --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --ip="10.142.55.190" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--port="5050" --quiet="false" --quorum="2" 
--recovery_agent_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" --webui_dir="/usr/share/mesos/webui" 
--work_dir="/var/lib/mesos" 
--zk="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos"
{code}

mesos-slave cmd:
{code}
mesos-slave --appc_simple_discovery_uri_prefix="http://"; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
--docker="docker" --docker_kill_orphans="true" 
--docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/mesos/store/docker" 
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname="10.142.55.190" --hostname_lookup="true" 
--http_authenticators="basic" --http_command_executor="false" 
--image_provisioner_backend="copy" --initialize_driver_logging="true" 
--ip="10.142.55.190" --isolation="posix/cpu,posix/mem" --launcher="posix" 
--launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" 
--master="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
--registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/var/lib/mesos"
{code}

When I run mesos-master from command-line, I got 

{code}
I0919 17:20:19.286264 17550 replica.cpp:673] Replica in VOTING status received 
a broadcasted recover request from (583)@10.142.55.202:5050
F0919 17:20:20.009371 17556 master.cpp:1536] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
    @     0x7f9db78458dd  google::LogMessage::Fail()
    @     0x7f9db784771d  google::LogMessage::SendToLog()
    @     0x7f9db78454cc  google::LogMessage::Flush()
    @     0x7f9db7848019  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9db6e2dbbc  mesos::internal::master::fail()
    @     0x7f9db6e75b20  
_ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EE
            EEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
    @           0x42a116  process::Future<>::fail()
    @     0x7f9db6e9f705  process::internal::thenf<>()
    @     0x7f9db6efd016  
_ZN7process8internal3runISt8functionIFvRKNS_6FutureIN5mesos8internal8RegistryEEEEEJRS7_EEEvRKSt6vectorIT_SaISE_EEDp
            OT0_
I0919 17:20:20.025172 17553 replica.cpp:673] Replica in VOTING status received 
a broadcasted recover request from (212)@10.142.55.196:5050
    @     0x7f9db6f100de  process::Future<>::fail()
    @     0x7f9db6c57e86  process::internal::run<>()
    @     0x7f9db6f100cb  process::Future<>::fail()
    @     0x7f9db6ef2d34  mesos::internal::master::RegistrarProcess::_recover()
    @     0x7f9db77d5171  process::ProcessManager::resume()
    @     0x7f9db77d5477  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f9db5e439c0  (unknown)
    @     0x7f9db568ae9a  start_thread
    @     0x7f9db53b836d  (unknown)
[1]    17548 abort (core dumped)  mesos-master --agent_ping_timeout="15secs" 
--agent_reregister_timeout="10mins
{code}

it seems mesos-master quit by failure, so zookeeper restart it and elect a new 
leader???

---------------------------------------------------------

master info log:
{code}
    I0919 15:54:59.677438 13281 http.cpp:2022] Redirecting request for 
/master/state?jsonp=angular.callbacks._1x to the leading master zk3
    I0919 15:55:00.098667 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (768)@10.142.55.202:5050
    I0919 15:55:00.385279 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (185)@10.142.55.196:5050
    I0919 15:55:00.711119 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (771)@10.142.55.202:5050
    I0919 15:55:01.347291 13284 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (188)@10.142.55.196:5050
    I0919 15:55:01.597682 13284 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (774)@10.142.55.202:5050
    I0919 15:55:02.257159 13282 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (191)@10.142.55.196:5050
    I0919 15:55:02.370692 13287 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (777)@10.142.55.202:5050
    I0919 15:55:03.205920 13285 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (780)@10.142.55.202:5050
    I0919 15:55:03.260007 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (194)@10.142.55.196:5050
    I0919 15:55:03.929611 13283 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (783)@10.142.55.202:5050
    I0919 15:55:04.033308 13287 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (197)@10.142.55.196:5050
    I0919 15:55:04.591275 13284 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (200)@10.142.55.196:5050
    I0919 15:55:04.608211 13283 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (786)@10.142.55.202:5050
    I0919 15:55:05.184682 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (789)@10.142.55.202:5050
    I0919 15:55:05.268277 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (203)@10.142.55.196:5050
    I0919 15:55:05.775377 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (206)@10.142.55.196:5050
    I0919 15:55:05.916445 13285 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (792)@10.142.55.202:5050
    I0919 15:55:06.744927 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (209)@10.142.55.196:5050
    I0919 15:55:07.378521 13283 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (5)@10.142.55.202:5050
    I0919 15:55:07.393311 13285 network.hpp:430] ZooKeeper group memberships 
changed
    I0919 15:55:07.393427 13285 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000709' in ZooKeeper
    I0919 15:55:07.393985 13285 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000711' in ZooKeeper
    I0919 15:55:07.394394 13285 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000714' in ZooKeeper
    I0919 15:55:07.394843 13285 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000715' in ZooKeeper
    I0919 15:55:07.395418 13285 network.hpp:478] ZooKeeper group PIDs: { 
log-replica(1)@10.142.55.190:5050, log-replica(1)@10.142.55.196:5050, 
log-replica(1)@10.142.55.202:5050 }
    I0919 15:55:08.178272 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (14)@10.142.55.202:5050
    I0919 15:55:09.059562 13282 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (21)@10.142.55.202:5050
    I0919 15:55:09.700711 13286 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (24)@10.142.55.202:5050
    I0919 15:55:09.742185 13287 http.cpp:381] HTTP GET for /master/state from 
10.142.50.94:59987 with User-Agent='Mozilla/5.0 (Windows NT 6.2; WOW64; 
rv:47.0) Gecko/20100101 Firefox/47.0'
    I0919 15:55:09.742359 13287 http.cpp:2022] Redirecting request for 
/master/state?jsonp=angular.callbacks._1y to the leading master zk3
    I0919 15:55:10.660789 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (30)@10.142.55.202:5050
    I0919 15:55:11.480326 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (34)@10.142.55.202:5050
    I0919 15:55:12.386256 13286 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (37)@10.142.55.202:5050
    I0919 15:55:12.975137 13287 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (42)@10.142.55.202:5050
    I0919 15:55:13.843091 13285 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (47)@10.142.55.202:5050
    I0919 15:55:14.373478 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (51)@10.142.55.202:5050
    I0919 15:55:14.937181 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (54)@10.142.55.202:5050
    I0919 15:55:15.658219 13283 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (58)@10.142.55.202:5050
    I0919 15:55:16.007822 13286 network.hpp:430] ZooKeeper group memberships 
changed
    I0919 15:55:16.007972 13286 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000711' in ZooKeeper
    I0919 15:55:16.010170 13286 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000714' in ZooKeeper
    I0919 15:55:16.011462 13284 detector.cpp:152] Detected a new leader: 
(id='702')
    I0919 15:55:16.011556 13284 group.cpp:706] Trying to get 
'/mesos/json.info_0000000702' in ZooKeeper
    I0919 15:55:16.011968 13286 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000715' in ZooKeeper
    I0919 15:55:16.012526 13286 network.hpp:478] ZooKeeper group PIDs: { 
log-replica(1)@10.142.55.190:5050, log-replica(1)@10.142.55.196:5050, 
log-replica(1)@10.142.55.202:5050 }
    I0919 15:55:16.013156 13284 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:55:16.013222 13284 master.cpp:1847] The newly elected leader is 
master@10.142.55.190:5050 with id 677967bc-f6f0-46b3-a44e-72eed1befd60
    I0919 15:55:16.013244 13284 master.cpp:1860] Elected as the leading master!
    I0919 15:55:16.013273 13284 master.cpp:1547] Recovering from registrar
    I0919 15:55:16.013352 13284 registrar.cpp:332] Recovering registrar
    I0919 15:55:16.014081 13280 log.cpp:553] Attempting to start the writer
    I0919 15:55:16.014515 13280 replica.cpp:493] Replica received implicit 
promise request from (211)@10.142.55.190:5050 with proposal 1204590
    I0919 15:55:16.018023 13282 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.018028 13280 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 3.469479ms
    I0919 15:55:16.018338 13280 replica.cpp:342] Persisted promised to 1204590
    I0919 15:55:16.018508 13282 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.018645 13282 log.cpp:553] Attempting to start the writer
    I0919 15:55:16.018899 13282 replica.cpp:493] Replica received implicit 
promise request from (215)@10.142.55.190:5050 with proposal 1204591
    I0919 15:55:16.022183 13287 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.022367 13280 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.022510 13280 log.cpp:553] Attempting to start the writer
    I0919 15:55:16.028880 13282 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 9.870818ms
    I0919 15:55:16.029024 13282 replica.cpp:342] Persisted promised to 1204591
    I0919 15:55:16.029428 13286 replica.cpp:493] Replica received implicit 
promise request from (219)@10.142.55.190:5050 with proposal 1204592
    I0919 15:55:16.031600 13280 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.036208 13283 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.036454 13283 log.cpp:553] Attempting to start the writer
    I0919 15:55:16.040256 13286 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 10.783237ms
    I0919 15:55:16.040339 13286 replica.cpp:342] Persisted promised to 1204592
    I0919 15:55:16.040712 13286 replica.cpp:493] Replica received implicit 
promise request from (222)@10.142.55.190:5050 with proposal 1204593
    I0919 15:55:16.042196 13286 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 1.435071ms
    I0919 15:55:16.042250 13286 replica.cpp:342] Persisted promised to 1204593
    I0919 15:55:16.042981 13286 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.043099 13286 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.043303 13283 log.cpp:553] Attempting to start the writer
{code}

All later logs are looping :
{code}
    I0919 15:55:16.043676 13286 replica.cpp:493] Replica received implicit 
promise request from (225)@10.142.55.190:5050 with proposal 1204594
    I0919 15:55:16.044122 13286 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 404769ns
    I0919 15:55:16.044209 13286 replica.cpp:342] Persisted promised to 1204594
    I0919 15:55:16.044837 13281 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.044926 13281 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.045038 13281 log.cpp:553] Attempting to start the writer
{code}

slave info log:
{code}
    Log file created at: 2016/09/19 15:41:16
    Running on machine: ubuntu12
    Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
    I0919 15:41:16.346844 12986 logging.cpp:194] INFO level logging started!
    I0919 15:41:16.363313 12986 containerizer.cpp:196] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
    I0919 15:41:16.370334 12986 main.cpp:434] Starting Mesos agent
    I0919 15:41:16.371184 12986 slave.cpp:198] Agent started on 
1)@127.0.1.1:5051
    I0919 15:41:16.371636 12986 slave.cpp:199] Flags at startup: 
--appc_simple_discovery_uri_prefix="http://"; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
--docker="docker" --docker_kill_orphans="true" 
--docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/mesos/store/docker" 
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_command_executor="false" --image_provisioner_backend="copy" 
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" 
--launcher="posix" --launcher_dir="/usr/libexec/mesos" 
--log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" 
--master="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
--registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/var/lib/mesos"
    I0919 15:41:16.373072 12986 slave.cpp:519] Agent resources: cpus(*):2; 
mem(*):2930; disk(*):4469; ports(*):[31000-32000]
    I0919 15:41:16.373291 12986 slave.cpp:527] Agent attributes: [  ]
    I0919 15:41:16.373347 12986 slave.cpp:532] Agent hostname: ubuntu12
    I0919 15:41:16.379895 13005 state.cpp:57] Recovering state from 
'/var/lib/mesos/meta'
    I0919 15:41:16.382519 13005 group.cpp:349] Group process 
(group(1)@127.0.1.1:5051) connected to ZooKeeper
    I0919 15:41:16.382593 13005 group.cpp:837] Syncing group operations: queue 
size (joins, cancels, datas) = (0, 0, 0)
    I0919 15:41:16.382663 13005 group.cpp:427] Trying to create path '/mesos' 
in ZooKeeper
    I0919 15:41:16.382910 13009 status_update_manager.cpp:200] Recovering 
status update manager
    I0919 15:41:16.383419 13009 containerizer.cpp:522] Recovering containerizer
    I0919 15:41:16.392206 13004 provisioner.cpp:253] Provisioner recovery 
complete
    I0919 15:41:16.392354 13004 slave.cpp:4782] Finished recovery
    I0919 15:41:16.405709 13004 detector.cpp:152] Detected a new leader: 
(id='678')
    I0919 15:41:16.406067 13005 group.cpp:706] Trying to get 
'/mesos/json.info_0000000678' in ZooKeeper
    I0919 15:41:16.407572 13002 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:41:16.407977 13002 slave.cpp:895] New master detected at 
master@10.142.55.190:5050
    I0919 15:41:16.408043 13002 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:41:16.408140 13002 slave.cpp:927] Detecting new master
    I0919 15:41:16.408223 13005 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:42:08.418956 13006 slave.cpp:3732] master@10.142.55.190:5050 exited
    W0919 15:42:08.419035 13006 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:42:16.374977 13007 slave.cpp:4591] Current disk usage 72.41%. Max 
allowed age: 1.231186482451933days
    I0919 15:42:20.007169 13007 detector.cpp:152] Detected a new leader: 
(id='679')
    I0919 15:42:20.007297 13007 group.cpp:706] Trying to get 
'/mesos/json.info_0000000679' in ZooKeeper
    I0919 15:42:20.008503 13007 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.196:5050) is detected
    I0919 15:42:20.008587 13007 slave.cpp:895] New master detected at 
master@10.142.55.196:5050
    I0919 15:42:20.008610 13007 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:42:20.008703 13007 slave.cpp:927] Detecting new master
    I0919 15:42:20.008750 13007 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:43:16.387984 13005 slave.cpp:4591] Current disk usage 72.41%. Max 
allowed age: 1.231162010606794days
    I0919 15:43:20.081272 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:43:20.081374 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:43:26.855154 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:43:26.855315 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:43:26.855159 13010 process.cpp:2105] Failed to shutdown socket 
with fd 12: Transport endpoint is not connected
    I0919 15:43:32.020196 13002 detector.cpp:152] Detected a new leader: 
(id='682')
    I0919 15:43:32.020300 13002 group.cpp:706] Trying to get 
'/mesos/json.info_0000000682' in ZooKeeper
    I0919 15:43:32.022203 13002 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.202:5050) is detected
    I0919 15:43:32.022302 13002 slave.cpp:895] New master detected at 
master@10.142.55.202:5050
    I0919 15:43:32.022328 13002 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:43:32.022382 13002 slave.cpp:927] Detecting new master
    I0919 15:43:32.022423 13002 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:44:16.389369 13003 slave.cpp:4591] Current disk usage 72.41%. Max 
allowed age: 1.231119184877789days
    I0919 15:44:32.535347 13003 slave.cpp:3732] master@10.142.55.202:5050 exited
    W0919 15:44:32.535522 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:44:42.005375 13002 detector.cpp:152] Detected a new leader: 
(id='684')
    I0919 15:44:42.005496 13002 group.cpp:706] Trying to get 
'/mesos/json.info_0000000684' in ZooKeeper
    I0919 15:44:42.006367 13002 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:44:42.006492 13002 slave.cpp:895] New master detected at 
master@10.142.55.190:5050
    I0919 15:44:42.006597 13002 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:44:42.006675 13002 slave.cpp:927] Detecting new master
    I0919 15:44:42.006577 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:45:16.400794 13006 slave.cpp:4591] Current disk usage 72.48%. Max 
allowed age: 1.226390000804074days
    I0919 15:45:42.354790 13005 slave.cpp:3732] master@10.142.55.190:5050 exited
    W0919 15:45:42.354857 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:45:54.020563 13002 detector.cpp:152] Detected a new leader: 
(id='687')
    I0919 15:45:54.020756 13002 group.cpp:706] Trying to get 
'/mesos/json.info_0000000687' in ZooKeeper
    I0919 15:45:54.023296 13002 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.196:5050) is detected
    I0919 15:45:54.023455 13002 slave.cpp:895] New master detected at 
master@10.142.55.196:5050
    I0919 15:45:54.023558 13002 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:45:54.023526 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:45:54.023669 13002 slave.cpp:927] Detecting new master
    I0919 15:46:16.402402 13003 slave.cpp:4591] Current disk usage 72.53%. Max 
allowed age: 1.223205601954942days
    I0919 15:46:54.075505 13007 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:46:54.075592 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:46:56.098012 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:46:56.098016 13007 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:46:56.098253 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:46:56.462254 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:46:56.462260 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:46:56.462540 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:47:02.005637 13009 detector.cpp:152] Detected a new leader: 
(id='688')
    I0919 15:47:02.005765 13009 group.cpp:706] Trying to get 
'/mesos/json.info_0000000688' in ZooKeeper
    I0919 15:47:02.006853 13009 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.202:5050) is detected
    I0919 15:47:02.006959 13009 slave.cpp:895] New master detected at 
master@10.142.55.202:5050
    I0919 15:47:02.006986 13009 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:47:02.007025 13009 slave.cpp:927] Detecting new master
    I0919 15:47:02.007061 13009 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:47:16.406669 13008 slave.cpp:4591] Current disk usage 72.53%. Max 
allowed age: 1.223184189090440days
    I0919 15:48:02.950891 13005 slave.cpp:3732] master@10.142.55.202:5050 exited
    W0919 15:48:02.950994 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:48:12.006634 13005 detector.cpp:152] Detected a new leader: 
(id='690')
    I0919 15:48:12.006817 13003 group.cpp:706] Trying to get 
'/mesos/json.info_0000000690' in ZooKeeper
    I0919 15:48:12.007987 13003 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:48:12.008126 13003 slave.cpp:895] New master detected at 
master@10.142.55.190:5050
    I0919 15:48:12.008210 13003 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:48:12.008280 13003 slave.cpp:927] Detecting new master
    I0919 15:48:12.008191 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:48:16.409266 13003 slave.cpp:4591] Current disk usage 72.54%. Max 
allowed age: 1.222480623542604days
    I0919 15:49:12.379010 13009 slave.cpp:3732] master@10.142.55.190:5050 exited
    W0919 15:49:12.379149 13009 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:49:12.379233 13010 process.cpp:2105] Failed to shutdown socket 
with fd 12: Transport endpoint is not connected
    I0919 15:49:16.413767 13007 slave.cpp:4591] Current disk usage 72.64%. Max 
allowed age: 1.215032005677465days
    I0919 15:49:24.016290 13007 detector.cpp:152] Detected a new leader: 
(id='693')
    I0919 15:49:24.016417 13007 group.cpp:706] Trying to get 
'/mesos/json.info_0000000693' in ZooKeeper
    I0919 15:49:24.018273 13007 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.196:5050) is detected
    I0919 15:49:24.018437 13007 slave.cpp:895] New master detected at 
master@10.142.55.196:5050
    I0919 15:49:24.018523 13007 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:49:24.018604 13007 slave.cpp:927] Detecting new master
    I0919 15:49:24.018496 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:50:16.416391 13008 slave.cpp:4591] Current disk usage 72.64%. Max 
allowed age: 1.215016710774248days
    I0919 15:50:24.065268 13003 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:50:24.065342 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:50:24.485752 13004 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:50:24.485839 13004 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:24.485977 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:50:28.343647 13003 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:50:28.343719 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:28.343819 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:50:31.545099 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:50:31.545171 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:31.545284 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:50:32.007096 13008 detector.cpp:152] Detected a new leader: 
(id='694')
    I0919 15:50:32.007195 13008 group.cpp:706] Trying to get 
'/mesos/json.info_0000000694' in ZooKeeper
    I0919 15:50:32.009881 13008 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.202:5050) is detected
    I0919 15:50:32.009970 13008 slave.cpp:895] New master detected at 
master@10.142.55.202:5050
    I0919 15:50:32.009994 13008 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:50:32.010030 13008 slave.cpp:927] Detecting new master
    I0919 15:50:32.010079 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:51:16.417846 13006 slave.cpp:4591] Current disk usage 72.64%. Max 
allowed age: 1.214964708103322days
    I0919 15:51:32.560317 13003 slave.cpp:3732] master@10.142.55.202:5050 exited
    W0919 15:51:32.560410 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:51:42.005147 13009 detector.cpp:152] Detected a new leader: 
(id='696')
    I0919 15:51:42.005265 13009 group.cpp:706] Trying to get 
'/mesos/json.info_0000000696' in ZooKeeper
    I0919 15:51:42.006824 13009 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:51:42.006904 13009 slave.cpp:895] New master detected at 
master@10.142.55.190:5050
    I0919 15:51:42.006928 13009 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:51:42.006963 13009 slave.cpp:927] Detecting new master
    I0919 15:51:42.006999 13009 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:52:16.419373 13003 slave.cpp:4591] Current disk usage 72.71%. Max 
allowed age: 1.209981628636250days
    I0919 15:52:42.336305 13002 slave.cpp:3732] master@10.142.55.190:5050 exited
    W0919 15:52:42.336426 13002 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:52:54.005267 13005 detector.cpp:152] Detected a new leader: 
(id='699')
    I0919 15:52:54.005408 13005 group.cpp:706] Trying to get 
'/mesos/json.info_0000000699' in ZooKeeper
    I0919 15:52:54.006206 13005 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.196:5050) is detected
    I0919 15:52:54.006285 13005 slave.cpp:895] New master detected at 
master@10.142.55.196:5050
    I0919 15:52:54.006309 13005 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:52:54.006398 13005 slave.cpp:927] Detecting new master
    I0919 15:52:54.006451 13005 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:53:16.420258 13005 slave.cpp:4591] Current disk usage 72.76%. Max 
allowed age: 1.206748286096840days
    I0919 15:53:54.071012 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:53:54.071143 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:54:01.105780 13002 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:54:01.105854 13002 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:54:01.105970 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    I0919 15:54:05.733837 13007 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:54:05.733932 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:54:05.734071 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    E0919 15:54:05.818560 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    I0919 15:54:05.818583 13003 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:54:05.818758 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:54:06.004385 13009 detector.cpp:152] Detected a new leader: 
(id='700')
    I0919 15:54:06.004494 13009 group.cpp:706] Trying to get 
'/mesos/json.info_0000000700' in ZooKeeper
    I0919 15:54:06.005511 13009 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.202:5050) is detected
    I0919 15:54:06.005586 13009 slave.cpp:895] New master detected at 
master@10.142.55.202:5050
    I0919 15:54:06.005609 13009 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:54:06.005676 13009 slave.cpp:927] Detecting new master
    I0919 15:54:06.005720 13009 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:54:16.423193 13002 slave.cpp:4591] Current disk usage 72.76%. Max 
allowed age: 1.206699342406551days
{code}

slave warn log
{code}
    Log file created at: 2016/09/19 15:42:08
    Running on machine: ubuntu12
    Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
    W0919 15:42:08.419035 13006 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:43:20.081374 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:43:26.855315 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:43:26.855159 13010 process.cpp:2105] Failed to shutdown socket 
with fd 12: Transport endpoint is not connected
    W0919 15:44:32.535522 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:45:42.354857 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:46:54.075592 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:46:56.098012 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:46:56.098253 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:46:56.462254 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:46:56.462540 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:48:02.950994 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:49:12.379149 13009 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:49:12.379233 13010 process.cpp:2105] Failed to shutdown socket 
with fd 12: Transport endpoint is not connected
    W0919 15:50:24.065342 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:50:24.485839 13004 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:24.485977 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:50:28.343719 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:28.343819 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:50:31.545171 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:31.545284 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:51:32.560410 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:52:42.336426 13002 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:53:54.071143 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:54:01.105854 13002 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:54:01.105970 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    W0919 15:54:05.733932 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:54:05.734071 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    E0919 15:54:05.818560 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    W0919 15:54:05.818758 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:55:06.821486 13009 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
{code}








  was:
I follow this 
[doc][https://open.mesosphere.com/getting-started/install/#verifying-installation]
 to setup mesos cluster.

There are three vm(ubuntu 12, centos 6.5, centos 7.2).

    $ cat /etc/hosts
    10.142.55.190 zk1
    10.142.55.196 zk2
    10.142.55.202 zk3

config in each mathine:

    $ cat /etc/mesos/zk
    zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos

----------------------------
After start zookeeper, mesos-master and mesos-slave in three vm, I can view the 
mesos webui(10.142.55.190:5050), but agents count is 0.
After a little time, mesos page get error:

    Failed to connect to 10.142.55.190:5050!
    Retrying in 16 seconds... 
(I found that zookeeper would elect a new leader in a short interval)

----------------------------------------
mesos-master cmd:
```
mesos-master --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --ip="10.142.55.190" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--port="5050" --quiet="false" --quorum="2" 
--recovery_agent_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" --webui_dir="/usr/share/mesos/webui" 
--work_dir="/var/lib/mesos" 
--zk="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos"
```


mesos-slave cmd:
```
mesos-slave --appc_simple_discovery_uri_prefix="http://"; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
--docker="docker" --docker_kill_orphans="true" 
--docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/mesos/store/docker" 
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname="10.142.55.190" --hostname_lookup="true" 
--http_authenticators="basic" --http_command_executor="false" 
--image_provisioner_backend="copy" --initialize_driver_logging="true" 
--ip="10.142.55.190" --isolation="posix/cpu,posix/mem" --launcher="posix" 
--launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" 
--master="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
--registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/var/lib/mesos"
```

When I run mesos-master from command-line, I got 

```
I0919 17:20:19.286264 17550 replica.cpp:673] Replica in VOTING status received 
a broadcasted recover request from (583)@10.142.55.202:5050
F0919 17:20:20.009371 17556 master.cpp:1536] Recovery failed: Failed to recover 
registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
    @     0x7f9db78458dd  google::LogMessage::Fail()
    @     0x7f9db784771d  google::LogMessage::SendToLog()
    @     0x7f9db78454cc  google::LogMessage::Flush()
    @     0x7f9db7848019  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f9db6e2dbbc  mesos::internal::master::fail()
    @     0x7f9db6e75b20  
_ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EE
            EEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
    @           0x42a116  process::Future<>::fail()
    @     0x7f9db6e9f705  process::internal::thenf<>()
    @     0x7f9db6efd016  
_ZN7process8internal3runISt8functionIFvRKNS_6FutureIN5mesos8internal8RegistryEEEEEJRS7_EEEvRKSt6vectorIT_SaISE_EEDp
            OT0_
I0919 17:20:20.025172 17553 replica.cpp:673] Replica in VOTING status received 
a broadcasted recover request from (212)@10.142.55.196:5050
    @     0x7f9db6f100de  process::Future<>::fail()
    @     0x7f9db6c57e86  process::internal::run<>()
    @     0x7f9db6f100cb  process::Future<>::fail()
    @     0x7f9db6ef2d34  mesos::internal::master::RegistrarProcess::_recover()
    @     0x7f9db77d5171  process::ProcessManager::resume()
    @     0x7f9db77d5477  
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f9db5e439c0  (unknown)
    @     0x7f9db568ae9a  start_thread
    @     0x7f9db53b836d  (unknown)
[1]    17548 abort (core dumped)  mesos-master --agent_ping_timeout="15secs" 
--agent_reregister_timeout="10mins
```

it seems mesos-master quit by failure, so zookeeper restart it and elect a new 
leader???

---------------------------------------------------------

master info log:

    I0919 15:54:59.677438 13281 http.cpp:2022] Redirecting request for 
/master/state?jsonp=angular.callbacks._1x to the leading master zk3
    I0919 15:55:00.098667 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (768)@10.142.55.202:5050
    I0919 15:55:00.385279 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (185)@10.142.55.196:5050
    I0919 15:55:00.711119 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (771)@10.142.55.202:5050
    I0919 15:55:01.347291 13284 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (188)@10.142.55.196:5050
    I0919 15:55:01.597682 13284 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (774)@10.142.55.202:5050
    I0919 15:55:02.257159 13282 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (191)@10.142.55.196:5050
    I0919 15:55:02.370692 13287 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (777)@10.142.55.202:5050
    I0919 15:55:03.205920 13285 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (780)@10.142.55.202:5050
    I0919 15:55:03.260007 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (194)@10.142.55.196:5050
    I0919 15:55:03.929611 13283 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (783)@10.142.55.202:5050
    I0919 15:55:04.033308 13287 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (197)@10.142.55.196:5050
    I0919 15:55:04.591275 13284 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (200)@10.142.55.196:5050
    I0919 15:55:04.608211 13283 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (786)@10.142.55.202:5050
    I0919 15:55:05.184682 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (789)@10.142.55.202:5050
    I0919 15:55:05.268277 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (203)@10.142.55.196:5050
    I0919 15:55:05.775377 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (206)@10.142.55.196:5050
    I0919 15:55:05.916445 13285 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (792)@10.142.55.202:5050
    I0919 15:55:06.744927 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (209)@10.142.55.196:5050
    I0919 15:55:07.378521 13283 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (5)@10.142.55.202:5050
    I0919 15:55:07.393311 13285 network.hpp:430] ZooKeeper group memberships 
changed
    I0919 15:55:07.393427 13285 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000709' in ZooKeeper
    I0919 15:55:07.393985 13285 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000711' in ZooKeeper
    I0919 15:55:07.394394 13285 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000714' in ZooKeeper
    I0919 15:55:07.394843 13285 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000715' in ZooKeeper
    I0919 15:55:07.395418 13285 network.hpp:478] ZooKeeper group PIDs: { 
log-replica(1)@10.142.55.190:5050, log-replica(1)@10.142.55.196:5050, 
log-replica(1)@10.142.55.202:5050 }
    I0919 15:55:08.178272 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (14)@10.142.55.202:5050
    I0919 15:55:09.059562 13282 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (21)@10.142.55.202:5050
    I0919 15:55:09.700711 13286 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (24)@10.142.55.202:5050
    I0919 15:55:09.742185 13287 http.cpp:381] HTTP GET for /master/state from 
10.142.50.94:59987 with User-Agent='Mozilla/5.0 (Windows NT 6.2; WOW64; 
rv:47.0) Gecko/20100101 Firefox/47.0'
    I0919 15:55:09.742359 13287 http.cpp:2022] Redirecting request for 
/master/state?jsonp=angular.callbacks._1y to the leading master zk3
    I0919 15:55:10.660789 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (30)@10.142.55.202:5050
    I0919 15:55:11.480326 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (34)@10.142.55.202:5050
    I0919 15:55:12.386256 13286 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (37)@10.142.55.202:5050
    I0919 15:55:12.975137 13287 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (42)@10.142.55.202:5050
    I0919 15:55:13.843091 13285 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (47)@10.142.55.202:5050
    I0919 15:55:14.373478 13281 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (51)@10.142.55.202:5050
    I0919 15:55:14.937181 13280 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (54)@10.142.55.202:5050
    I0919 15:55:15.658219 13283 replica.cpp:673] Replica in VOTING status 
received a broadcasted recover request from (58)@10.142.55.202:5050
    I0919 15:55:16.007822 13286 network.hpp:430] ZooKeeper group memberships 
changed
    I0919 15:55:16.007972 13286 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000711' in ZooKeeper
    I0919 15:55:16.010170 13286 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000714' in ZooKeeper
    I0919 15:55:16.011462 13284 detector.cpp:152] Detected a new leader: 
(id='702')
    I0919 15:55:16.011556 13284 group.cpp:706] Trying to get 
'/mesos/json.info_0000000702' in ZooKeeper
    I0919 15:55:16.011968 13286 group.cpp:706] Trying to get 
'/mesos/log_replicas/0000000715' in ZooKeeper
    I0919 15:55:16.012526 13286 network.hpp:478] ZooKeeper group PIDs: { 
log-replica(1)@10.142.55.190:5050, log-replica(1)@10.142.55.196:5050, 
log-replica(1)@10.142.55.202:5050 }
    I0919 15:55:16.013156 13284 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:55:16.013222 13284 master.cpp:1847] The newly elected leader is 
master@10.142.55.190:5050 with id 677967bc-f6f0-46b3-a44e-72eed1befd60
    I0919 15:55:16.013244 13284 master.cpp:1860] Elected as the leading master!
    I0919 15:55:16.013273 13284 master.cpp:1547] Recovering from registrar
    I0919 15:55:16.013352 13284 registrar.cpp:332] Recovering registrar
    I0919 15:55:16.014081 13280 log.cpp:553] Attempting to start the writer
    I0919 15:55:16.014515 13280 replica.cpp:493] Replica received implicit 
promise request from (211)@10.142.55.190:5050 with proposal 1204590
    I0919 15:55:16.018023 13282 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.018028 13280 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 3.469479ms
    I0919 15:55:16.018338 13280 replica.cpp:342] Persisted promised to 1204590
    I0919 15:55:16.018508 13282 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.018645 13282 log.cpp:553] Attempting to start the writer
    I0919 15:55:16.018899 13282 replica.cpp:493] Replica received implicit 
promise request from (215)@10.142.55.190:5050 with proposal 1204591
    I0919 15:55:16.022183 13287 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.022367 13280 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.022510 13280 log.cpp:553] Attempting to start the writer
    I0919 15:55:16.028880 13282 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 9.870818ms
    I0919 15:55:16.029024 13282 replica.cpp:342] Persisted promised to 1204591
    I0919 15:55:16.029428 13286 replica.cpp:493] Replica received implicit 
promise request from (219)@10.142.55.190:5050 with proposal 1204592
    I0919 15:55:16.031600 13280 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.036208 13283 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.036454 13283 log.cpp:553] Attempting to start the writer
    I0919 15:55:16.040256 13286 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 10.783237ms
    I0919 15:55:16.040339 13286 replica.cpp:342] Persisted promised to 1204592
    I0919 15:55:16.040712 13286 replica.cpp:493] Replica received implicit 
promise request from (222)@10.142.55.190:5050 with proposal 1204593
    I0919 15:55:16.042196 13286 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 1.435071ms
    I0919 15:55:16.042250 13286 replica.cpp:342] Persisted promised to 1204593
    I0919 15:55:16.042981 13286 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.043099 13286 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.043303 13283 log.cpp:553] Attempting to start the writer


All later logs are looping :

    I0919 15:55:16.043676 13286 replica.cpp:493] Replica received implicit 
promise request from (225)@10.142.55.190:5050 with proposal 1204594
    I0919 15:55:16.044122 13286 leveldb.cpp:304] Persisting metadata (10 bytes) 
to leveldb took 404769ns
    I0919 15:55:16.044209 13286 replica.cpp:342] Persisted promised to 1204594
    I0919 15:55:16.044837 13281 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
    I0919 15:55:16.044926 13281 log.cpp:565] Could not start the writer, but 
can be retried
    I0919 15:55:16.045038 13281 log.cpp:553] Attempting to start the writer


slave info log:

    Log file created at: 2016/09/19 15:41:16
    Running on machine: ubuntu12
    Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
    I0919 15:41:16.346844 12986 logging.cpp:194] INFO level logging started!
    I0919 15:41:16.363313 12986 containerizer.cpp:196] Using isolation: 
posix/cpu,posix/mem,filesystem/posix,network/cni
    I0919 15:41:16.370334 12986 main.cpp:434] Starting Mesos agent
    I0919 15:41:16.371184 12986 slave.cpp:198] Agent started on 
1)@127.0.1.1:5051
    I0919 15:41:16.371636 12986 slave.cpp:199] Flags at startup: 
--appc_simple_discovery_uri_prefix="http://"; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
--docker="docker" --docker_kill_orphans="true" 
--docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/mesos/store/docker" 
--docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_command_executor="false" --image_provisioner_backend="copy" 
--initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" 
--launcher="posix" --launcher_dir="/usr/libexec/mesos" 
--log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" 
--master="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
--registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
--sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
--systemd_enable_support="true" 
--systemd_runtime_directory="/run/systemd/system" --version="false" 
--work_dir="/var/lib/mesos"
    I0919 15:41:16.373072 12986 slave.cpp:519] Agent resources: cpus(*):2; 
mem(*):2930; disk(*):4469; ports(*):[31000-32000]
    I0919 15:41:16.373291 12986 slave.cpp:527] Agent attributes: [  ]
    I0919 15:41:16.373347 12986 slave.cpp:532] Agent hostname: ubuntu12
    I0919 15:41:16.379895 13005 state.cpp:57] Recovering state from 
'/var/lib/mesos/meta'
    I0919 15:41:16.382519 13005 group.cpp:349] Group process 
(group(1)@127.0.1.1:5051) connected to ZooKeeper
    I0919 15:41:16.382593 13005 group.cpp:837] Syncing group operations: queue 
size (joins, cancels, datas) = (0, 0, 0)
    I0919 15:41:16.382663 13005 group.cpp:427] Trying to create path '/mesos' 
in ZooKeeper
    I0919 15:41:16.382910 13009 status_update_manager.cpp:200] Recovering 
status update manager
    I0919 15:41:16.383419 13009 containerizer.cpp:522] Recovering containerizer
    I0919 15:41:16.392206 13004 provisioner.cpp:253] Provisioner recovery 
complete
    I0919 15:41:16.392354 13004 slave.cpp:4782] Finished recovery
    I0919 15:41:16.405709 13004 detector.cpp:152] Detected a new leader: 
(id='678')
    I0919 15:41:16.406067 13005 group.cpp:706] Trying to get 
'/mesos/json.info_0000000678' in ZooKeeper
    I0919 15:41:16.407572 13002 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:41:16.407977 13002 slave.cpp:895] New master detected at 
master@10.142.55.190:5050
    I0919 15:41:16.408043 13002 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:41:16.408140 13002 slave.cpp:927] Detecting new master
    I0919 15:41:16.408223 13005 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:42:08.418956 13006 slave.cpp:3732] master@10.142.55.190:5050 exited
    W0919 15:42:08.419035 13006 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:42:16.374977 13007 slave.cpp:4591] Current disk usage 72.41%. Max 
allowed age: 1.231186482451933days
    I0919 15:42:20.007169 13007 detector.cpp:152] Detected a new leader: 
(id='679')
    I0919 15:42:20.007297 13007 group.cpp:706] Trying to get 
'/mesos/json.info_0000000679' in ZooKeeper
    I0919 15:42:20.008503 13007 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.196:5050) is detected
    I0919 15:42:20.008587 13007 slave.cpp:895] New master detected at 
master@10.142.55.196:5050
    I0919 15:42:20.008610 13007 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:42:20.008703 13007 slave.cpp:927] Detecting new master
    I0919 15:42:20.008750 13007 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:43:16.387984 13005 slave.cpp:4591] Current disk usage 72.41%. Max 
allowed age: 1.231162010606794days
    I0919 15:43:20.081272 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:43:20.081374 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:43:26.855154 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:43:26.855315 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:43:26.855159 13010 process.cpp:2105] Failed to shutdown socket 
with fd 12: Transport endpoint is not connected
    I0919 15:43:32.020196 13002 detector.cpp:152] Detected a new leader: 
(id='682')
    I0919 15:43:32.020300 13002 group.cpp:706] Trying to get 
'/mesos/json.info_0000000682' in ZooKeeper
    I0919 15:43:32.022203 13002 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.202:5050) is detected
    I0919 15:43:32.022302 13002 slave.cpp:895] New master detected at 
master@10.142.55.202:5050
    I0919 15:43:32.022328 13002 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:43:32.022382 13002 slave.cpp:927] Detecting new master
    I0919 15:43:32.022423 13002 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:44:16.389369 13003 slave.cpp:4591] Current disk usage 72.41%. Max 
allowed age: 1.231119184877789days
    I0919 15:44:32.535347 13003 slave.cpp:3732] master@10.142.55.202:5050 exited
    W0919 15:44:32.535522 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:44:42.005375 13002 detector.cpp:152] Detected a new leader: 
(id='684')
    I0919 15:44:42.005496 13002 group.cpp:706] Trying to get 
'/mesos/json.info_0000000684' in ZooKeeper
    I0919 15:44:42.006367 13002 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:44:42.006492 13002 slave.cpp:895] New master detected at 
master@10.142.55.190:5050
    I0919 15:44:42.006597 13002 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:44:42.006675 13002 slave.cpp:927] Detecting new master
    I0919 15:44:42.006577 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:45:16.400794 13006 slave.cpp:4591] Current disk usage 72.48%. Max 
allowed age: 1.226390000804074days
    I0919 15:45:42.354790 13005 slave.cpp:3732] master@10.142.55.190:5050 exited
    W0919 15:45:42.354857 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:45:54.020563 13002 detector.cpp:152] Detected a new leader: 
(id='687')
    I0919 15:45:54.020756 13002 group.cpp:706] Trying to get 
'/mesos/json.info_0000000687' in ZooKeeper
    I0919 15:45:54.023296 13002 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.196:5050) is detected
    I0919 15:45:54.023455 13002 slave.cpp:895] New master detected at 
master@10.142.55.196:5050
    I0919 15:45:54.023558 13002 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:45:54.023526 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:45:54.023669 13002 slave.cpp:927] Detecting new master
    I0919 15:46:16.402402 13003 slave.cpp:4591] Current disk usage 72.53%. Max 
allowed age: 1.223205601954942days
    I0919 15:46:54.075505 13007 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:46:54.075592 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:46:56.098012 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:46:56.098016 13007 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:46:56.098253 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:46:56.462254 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:46:56.462260 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:46:56.462540 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:47:02.005637 13009 detector.cpp:152] Detected a new leader: 
(id='688')
    I0919 15:47:02.005765 13009 group.cpp:706] Trying to get 
'/mesos/json.info_0000000688' in ZooKeeper
    I0919 15:47:02.006853 13009 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.202:5050) is detected
    I0919 15:47:02.006959 13009 slave.cpp:895] New master detected at 
master@10.142.55.202:5050
    I0919 15:47:02.006986 13009 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:47:02.007025 13009 slave.cpp:927] Detecting new master
    I0919 15:47:02.007061 13009 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:47:16.406669 13008 slave.cpp:4591] Current disk usage 72.53%. Max 
allowed age: 1.223184189090440days
    I0919 15:48:02.950891 13005 slave.cpp:3732] master@10.142.55.202:5050 exited
    W0919 15:48:02.950994 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:48:12.006634 13005 detector.cpp:152] Detected a new leader: 
(id='690')
    I0919 15:48:12.006817 13003 group.cpp:706] Trying to get 
'/mesos/json.info_0000000690' in ZooKeeper
    I0919 15:48:12.007987 13003 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:48:12.008126 13003 slave.cpp:895] New master detected at 
master@10.142.55.190:5050
    I0919 15:48:12.008210 13003 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:48:12.008280 13003 slave.cpp:927] Detecting new master
    I0919 15:48:12.008191 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:48:16.409266 13003 slave.cpp:4591] Current disk usage 72.54%. Max 
allowed age: 1.222480623542604days
    I0919 15:49:12.379010 13009 slave.cpp:3732] master@10.142.55.190:5050 exited
    W0919 15:49:12.379149 13009 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:49:12.379233 13010 process.cpp:2105] Failed to shutdown socket 
with fd 12: Transport endpoint is not connected
    I0919 15:49:16.413767 13007 slave.cpp:4591] Current disk usage 72.64%. Max 
allowed age: 1.215032005677465days
    I0919 15:49:24.016290 13007 detector.cpp:152] Detected a new leader: 
(id='693')
    I0919 15:49:24.016417 13007 group.cpp:706] Trying to get 
'/mesos/json.info_0000000693' in ZooKeeper
    I0919 15:49:24.018273 13007 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.196:5050) is detected
    I0919 15:49:24.018437 13007 slave.cpp:895] New master detected at 
master@10.142.55.196:5050
    I0919 15:49:24.018523 13007 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:49:24.018604 13007 slave.cpp:927] Detecting new master
    I0919 15:49:24.018496 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:50:16.416391 13008 slave.cpp:4591] Current disk usage 72.64%. Max 
allowed age: 1.215016710774248days
    I0919 15:50:24.065268 13003 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:50:24.065342 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:50:24.485752 13004 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:50:24.485839 13004 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:24.485977 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:50:28.343647 13003 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:50:28.343719 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:28.343819 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:50:31.545099 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:50:31.545171 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:31.545284 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    I0919 15:50:32.007096 13008 detector.cpp:152] Detected a new leader: 
(id='694')
    I0919 15:50:32.007195 13008 group.cpp:706] Trying to get 
'/mesos/json.info_0000000694' in ZooKeeper
    I0919 15:50:32.009881 13008 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.202:5050) is detected
    I0919 15:50:32.009970 13008 slave.cpp:895] New master detected at 
master@10.142.55.202:5050
    I0919 15:50:32.009994 13008 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:50:32.010030 13008 slave.cpp:927] Detecting new master
    I0919 15:50:32.010079 13008 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:51:16.417846 13006 slave.cpp:4591] Current disk usage 72.64%. Max 
allowed age: 1.214964708103322days
    I0919 15:51:32.560317 13003 slave.cpp:3732] master@10.142.55.202:5050 exited
    W0919 15:51:32.560410 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:51:42.005147 13009 detector.cpp:152] Detected a new leader: 
(id='696')
    I0919 15:51:42.005265 13009 group.cpp:706] Trying to get 
'/mesos/json.info_0000000696' in ZooKeeper
    I0919 15:51:42.006824 13009 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.190:5050) is detected
    I0919 15:51:42.006904 13009 slave.cpp:895] New master detected at 
master@10.142.55.190:5050
    I0919 15:51:42.006928 13009 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:51:42.006963 13009 slave.cpp:927] Detecting new master
    I0919 15:51:42.006999 13009 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:52:16.419373 13003 slave.cpp:4591] Current disk usage 72.71%. Max 
allowed age: 1.209981628636250days
    I0919 15:52:42.336305 13002 slave.cpp:3732] master@10.142.55.190:5050 exited
    W0919 15:52:42.336426 13002 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:52:54.005267 13005 detector.cpp:152] Detected a new leader: 
(id='699')
    I0919 15:52:54.005408 13005 group.cpp:706] Trying to get 
'/mesos/json.info_0000000699' in ZooKeeper
    I0919 15:52:54.006206 13005 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.196:5050) is detected
    I0919 15:52:54.006285 13005 slave.cpp:895] New master detected at 
master@10.142.55.196:5050
    I0919 15:52:54.006309 13005 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:52:54.006398 13005 slave.cpp:927] Detecting new master
    I0919 15:52:54.006451 13005 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:53:16.420258 13005 slave.cpp:4591] Current disk usage 72.76%. Max 
allowed age: 1.206748286096840days
    I0919 15:53:54.071012 13005 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:53:54.071143 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:54:01.105780 13002 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:54:01.105854 13002 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:54:01.105970 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    I0919 15:54:05.733837 13007 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:54:05.733932 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:54:05.734071 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    E0919 15:54:05.818560 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    I0919 15:54:05.818583 13003 slave.cpp:3732] master@10.142.55.196:5050 exited
    W0919 15:54:05.818758 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    I0919 15:54:06.004385 13009 detector.cpp:152] Detected a new leader: 
(id='700')
    I0919 15:54:06.004494 13009 group.cpp:706] Trying to get 
'/mesos/json.info_0000000700' in ZooKeeper
    I0919 15:54:06.005511 13009 zookeeper.cpp:259] A new leading master 
(UPID=master@10.142.55.202:5050) is detected
    I0919 15:54:06.005586 13009 slave.cpp:895] New master detected at 
master@10.142.55.202:5050
    I0919 15:54:06.005609 13009 slave.cpp:916] No credentials provided. 
Attempting to register without authentication
    I0919 15:54:06.005676 13009 slave.cpp:927] Detecting new master
    I0919 15:54:06.005720 13009 status_update_manager.cpp:174] Pausing sending 
status updates
    I0919 15:54:16.423193 13002 slave.cpp:4591] Current disk usage 72.76%. Max 
allowed age: 1.206699342406551days


slave warn log

    Log file created at: 2016/09/19 15:42:08
    Running on machine: ubuntu12
    Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
    W0919 15:42:08.419035 13006 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:43:20.081374 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:43:26.855315 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:43:26.855159 13010 process.cpp:2105] Failed to shutdown socket 
with fd 12: Transport endpoint is not connected
    W0919 15:44:32.535522 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:45:42.354857 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:46:54.075592 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:46:56.098012 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:46:56.098253 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:46:56.462254 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:46:56.462540 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:48:02.950994 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:49:12.379149 13009 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:49:12.379233 13010 process.cpp:2105] Failed to shutdown socket 
with fd 12: Transport endpoint is not connected
    W0919 15:50:24.065342 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:50:24.485839 13004 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:24.485977 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:50:28.343719 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:28.343819 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:50:31.545171 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:50:31.545284 13010 process.cpp:2105] Failed to shutdown socket 
with fd 14: Transport endpoint is not connected
    W0919 15:51:32.560410 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:52:42.336426 13002 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:53:54.071143 13005 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:54:01.105854 13002 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:54:01.105970 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    W0919 15:54:05.733932 13007 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    E0919 15:54:05.734071 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    E0919 15:54:05.818560 13010 process.cpp:2105] Failed to shutdown socket 
with fd 15: Transport endpoint is not connected
    W0919 15:54:05.818758 13003 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected
    W0919 15:55:06.821486 13009 slave.cpp:3737] Master disconnected! Waiting 
for a new master to be elected










> mesos-master can not found mesos-slave, and elect a new leader in a short 
> interval
> ----------------------------------------------------------------------------------
>
>                 Key: MESOS-6205
>                 URL: https://issues.apache.org/jira/browse/MESOS-6205
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: ubuntu 12 x64, centos 6.5 x64, centos 7.2 x64
>            Reporter: kasim
>
> I follow this 
> [doc][https://open.mesosphere.com/getting-started/install/#verifying-installation]
>  to setup mesos cluster.
> There are three vm(ubuntu 12, centos 6.5, centos 7.2).
> {code}
>     $ cat /etc/hosts
>     10.142.55.190 zk1
>     10.142.55.196 zk2
>     10.142.55.202 zk3
> {code}
> config in each mathine:
> {code}
>     $ cat /etc/mesos/zk
>     zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos
> {code}
> ----------------------------
> After start zookeeper, mesos-master and mesos-slave in three vm, I can view 
> the mesos webui(10.142.55.190:5050), but agents count is 0.
> After a little time, mesos page get error:
> {code}
>     Failed to connect to 10.142.55.190:5050!
>     Retrying in 16 seconds... 
> {code}
> (I found that zookeeper would elect a new leader in a short interval)
> ----------------------------------------
> mesos-master cmd:
> {code}
> mesos-master --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="false" 
> --authenticate_frameworks="false" --authenticate_http_frameworks="false" 
> --authenticate_http_readonly="false" --authenticate_http_readwrite="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --ip="10.142.55.190" 
> --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
> --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --port="5050" --quiet="false" --quorum="2" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" 
> --registry_strict="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/var/lib/mesos" 
> --zk="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos"
> {code}
> mesos-slave cmd:
> {code}
> mesos-slave --appc_simple_discovery_uri_prefix="http://"; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
> --docker="docker" --docker_kill_orphans="true" 
> --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
> --docker_store_dir="/tmp/mesos/store/docker" 
> --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" 
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" 
> --hadoop_home="" --help="false" --hostname="10.142.55.190" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_command_executor="false" --image_provisioner_backend="copy" 
> --initialize_driver_logging="true" --ip="10.142.55.190" 
> --isolation="posix/cpu,posix/mem" --launcher="posix" 
> --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" 
> --logbufsecs="0" --logging_level="INFO" 
> --master="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos"
>  --oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
> --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
> --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
> --systemd_enable_support="true" 
> --systemd_runtime_directory="/run/systemd/system" --version="false" 
> --work_dir="/var/lib/mesos"
> {code}
> When I run mesos-master from command-line, I got 
> {code}
> I0919 17:20:19.286264 17550 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (583)@10.142.55.202:5050
> F0919 17:20:20.009371 17556 master.cpp:1536] Recovery failed: Failed to 
> recover registrar: Failed to perform fetch within 1mins
> *** Check failure stack trace: ***
>     @     0x7f9db78458dd  google::LogMessage::Fail()
>     @     0x7f9db784771d  google::LogMessage::SendToLog()
>     @     0x7f9db78454cc  google::LogMessage::Flush()
>     @     0x7f9db7848019  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f9db6e2dbbc  mesos::internal::master::fail()
>     @     0x7f9db6e75b20  
> _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EE
>             EEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
>     @           0x42a116  process::Future<>::fail()
>     @     0x7f9db6e9f705  process::internal::thenf<>()
>     @     0x7f9db6efd016  
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureIN5mesos8internal8RegistryEEEEEJRS7_EEEvRKSt6vectorIT_SaISE_EEDp
>             OT0_
> I0919 17:20:20.025172 17553 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (212)@10.142.55.196:5050
>     @     0x7f9db6f100de  process::Future<>::fail()
>     @     0x7f9db6c57e86  process::internal::run<>()
>     @     0x7f9db6f100cb  process::Future<>::fail()
>     @     0x7f9db6ef2d34  
> mesos::internal::master::RegistrarProcess::_recover()
>     @     0x7f9db77d5171  process::ProcessManager::resume()
>     @     0x7f9db77d5477  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
>     @     0x7f9db5e439c0  (unknown)
>     @     0x7f9db568ae9a  start_thread
>     @     0x7f9db53b836d  (unknown)
> [1]    17548 abort (core dumped)  mesos-master --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins
> {code}
> it seems mesos-master quit by failure, so zookeeper restart it and elect a 
> new leader???
> ---------------------------------------------------------
> master info log:
> {code}
>     I0919 15:54:59.677438 13281 http.cpp:2022] Redirecting request for 
> /master/state?jsonp=angular.callbacks._1x to the leading master zk3
>     I0919 15:55:00.098667 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (768)@10.142.55.202:5050
>     I0919 15:55:00.385279 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (185)@10.142.55.196:5050
>     I0919 15:55:00.711119 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (771)@10.142.55.202:5050
>     I0919 15:55:01.347291 13284 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (188)@10.142.55.196:5050
>     I0919 15:55:01.597682 13284 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (774)@10.142.55.202:5050
>     I0919 15:55:02.257159 13282 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (191)@10.142.55.196:5050
>     I0919 15:55:02.370692 13287 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (777)@10.142.55.202:5050
>     I0919 15:55:03.205920 13285 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (780)@10.142.55.202:5050
>     I0919 15:55:03.260007 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (194)@10.142.55.196:5050
>     I0919 15:55:03.929611 13283 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (783)@10.142.55.202:5050
>     I0919 15:55:04.033308 13287 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (197)@10.142.55.196:5050
>     I0919 15:55:04.591275 13284 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (200)@10.142.55.196:5050
>     I0919 15:55:04.608211 13283 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (786)@10.142.55.202:5050
>     I0919 15:55:05.184682 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (789)@10.142.55.202:5050
>     I0919 15:55:05.268277 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (203)@10.142.55.196:5050
>     I0919 15:55:05.775377 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (206)@10.142.55.196:5050
>     I0919 15:55:05.916445 13285 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (792)@10.142.55.202:5050
>     I0919 15:55:06.744927 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (209)@10.142.55.196:5050
>     I0919 15:55:07.378521 13283 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (5)@10.142.55.202:5050
>     I0919 15:55:07.393311 13285 network.hpp:430] ZooKeeper group memberships 
> changed
>     I0919 15:55:07.393427 13285 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000709' in ZooKeeper
>     I0919 15:55:07.393985 13285 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000711' in ZooKeeper
>     I0919 15:55:07.394394 13285 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000714' in ZooKeeper
>     I0919 15:55:07.394843 13285 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000715' in ZooKeeper
>     I0919 15:55:07.395418 13285 network.hpp:478] ZooKeeper group PIDs: { 
> log-replica(1)@10.142.55.190:5050, log-replica(1)@10.142.55.196:5050, 
> log-replica(1)@10.142.55.202:5050 }
>     I0919 15:55:08.178272 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (14)@10.142.55.202:5050
>     I0919 15:55:09.059562 13282 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (21)@10.142.55.202:5050
>     I0919 15:55:09.700711 13286 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (24)@10.142.55.202:5050
>     I0919 15:55:09.742185 13287 http.cpp:381] HTTP GET for /master/state from 
> 10.142.50.94:59987 with User-Agent='Mozilla/5.0 (Windows NT 6.2; WOW64; 
> rv:47.0) Gecko/20100101 Firefox/47.0'
>     I0919 15:55:09.742359 13287 http.cpp:2022] Redirecting request for 
> /master/state?jsonp=angular.callbacks._1y to the leading master zk3
>     I0919 15:55:10.660789 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (30)@10.142.55.202:5050
>     I0919 15:55:11.480326 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (34)@10.142.55.202:5050
>     I0919 15:55:12.386256 13286 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (37)@10.142.55.202:5050
>     I0919 15:55:12.975137 13287 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (42)@10.142.55.202:5050
>     I0919 15:55:13.843091 13285 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (47)@10.142.55.202:5050
>     I0919 15:55:14.373478 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (51)@10.142.55.202:5050
>     I0919 15:55:14.937181 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (54)@10.142.55.202:5050
>     I0919 15:55:15.658219 13283 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (58)@10.142.55.202:5050
>     I0919 15:55:16.007822 13286 network.hpp:430] ZooKeeper group memberships 
> changed
>     I0919 15:55:16.007972 13286 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000711' in ZooKeeper
>     I0919 15:55:16.010170 13286 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000714' in ZooKeeper
>     I0919 15:55:16.011462 13284 detector.cpp:152] Detected a new leader: 
> (id='702')
>     I0919 15:55:16.011556 13284 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000702' in ZooKeeper
>     I0919 15:55:16.011968 13286 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000715' in ZooKeeper
>     I0919 15:55:16.012526 13286 network.hpp:478] ZooKeeper group PIDs: { 
> log-replica(1)@10.142.55.190:5050, log-replica(1)@10.142.55.196:5050, 
> log-replica(1)@10.142.55.202:5050 }
>     I0919 15:55:16.013156 13284 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:55:16.013222 13284 master.cpp:1847] The newly elected leader is 
> master@10.142.55.190:5050 with id 677967bc-f6f0-46b3-a44e-72eed1befd60
>     I0919 15:55:16.013244 13284 master.cpp:1860] Elected as the leading 
> master!
>     I0919 15:55:16.013273 13284 master.cpp:1547] Recovering from registrar
>     I0919 15:55:16.013352 13284 registrar.cpp:332] Recovering registrar
>     I0919 15:55:16.014081 13280 log.cpp:553] Attempting to start the writer
>     I0919 15:55:16.014515 13280 replica.cpp:493] Replica received implicit 
> promise request from (211)@10.142.55.190:5050 with proposal 1204590
>     I0919 15:55:16.018023 13282 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.018028 13280 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 3.469479ms
>     I0919 15:55:16.018338 13280 replica.cpp:342] Persisted promised to 1204590
>     I0919 15:55:16.018508 13282 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.018645 13282 log.cpp:553] Attempting to start the writer
>     I0919 15:55:16.018899 13282 replica.cpp:493] Replica received implicit 
> promise request from (215)@10.142.55.190:5050 with proposal 1204591
>     I0919 15:55:16.022183 13287 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.022367 13280 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.022510 13280 log.cpp:553] Attempting to start the writer
>     I0919 15:55:16.028880 13282 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 9.870818ms
>     I0919 15:55:16.029024 13282 replica.cpp:342] Persisted promised to 1204591
>     I0919 15:55:16.029428 13286 replica.cpp:493] Replica received implicit 
> promise request from (219)@10.142.55.190:5050 with proposal 1204592
>     I0919 15:55:16.031600 13280 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.036208 13283 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.036454 13283 log.cpp:553] Attempting to start the writer
>     I0919 15:55:16.040256 13286 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 10.783237ms
>     I0919 15:55:16.040339 13286 replica.cpp:342] Persisted promised to 1204592
>     I0919 15:55:16.040712 13286 replica.cpp:493] Replica received implicit 
> promise request from (222)@10.142.55.190:5050 with proposal 1204593
>     I0919 15:55:16.042196 13286 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 1.435071ms
>     I0919 15:55:16.042250 13286 replica.cpp:342] Persisted promised to 1204593
>     I0919 15:55:16.042981 13286 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.043099 13286 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.043303 13283 log.cpp:553] Attempting to start the writer
> {code}
> All later logs are looping :
> {code}
>     I0919 15:55:16.043676 13286 replica.cpp:493] Replica received implicit 
> promise request from (225)@10.142.55.190:5050 with proposal 1204594
>     I0919 15:55:16.044122 13286 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 404769ns
>     I0919 15:55:16.044209 13286 replica.cpp:342] Persisted promised to 1204594
>     I0919 15:55:16.044837 13281 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.044926 13281 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.045038 13281 log.cpp:553] Attempting to start the writer
> {code}
> slave info log:
> {code}
>     Log file created at: 2016/09/19 15:41:16
>     Running on machine: ubuntu12
>     Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>     I0919 15:41:16.346844 12986 logging.cpp:194] INFO level logging started!
>     I0919 15:41:16.363313 12986 containerizer.cpp:196] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
>     I0919 15:41:16.370334 12986 main.cpp:434] Starting Mesos agent
>     I0919 15:41:16.371184 12986 slave.cpp:198] Agent started on 
> 1)@127.0.1.1:5051
>     I0919 15:41:16.371636 12986 slave.cpp:199] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://"; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
> --docker="docker" --docker_kill_orphans="true" 
> --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
> --docker_store_dir="/tmp/mesos/store/docker" 
> --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" 
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" 
> --hadoop_home="" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_command_executor="false" 
> --image_provisioner_backend="copy" --initialize_driver_logging="true" 
> --isolation="posix/cpu,posix/mem" --launcher="posix" 
> --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" 
> --logbufsecs="0" --logging_level="INFO" 
> --master="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos"
>  --oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
> --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
> --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
> --systemd_enable_support="true" 
> --systemd_runtime_directory="/run/systemd/system" --version="false" 
> --work_dir="/var/lib/mesos"
>     I0919 15:41:16.373072 12986 slave.cpp:519] Agent resources: cpus(*):2; 
> mem(*):2930; disk(*):4469; ports(*):[31000-32000]
>     I0919 15:41:16.373291 12986 slave.cpp:527] Agent attributes: [  ]
>     I0919 15:41:16.373347 12986 slave.cpp:532] Agent hostname: ubuntu12
>     I0919 15:41:16.379895 13005 state.cpp:57] Recovering state from 
> '/var/lib/mesos/meta'
>     I0919 15:41:16.382519 13005 group.cpp:349] Group process 
> (group(1)@127.0.1.1:5051) connected to ZooKeeper
>     I0919 15:41:16.382593 13005 group.cpp:837] Syncing group operations: 
> queue size (joins, cancels, datas) = (0, 0, 0)
>     I0919 15:41:16.382663 13005 group.cpp:427] Trying to create path '/mesos' 
> in ZooKeeper
>     I0919 15:41:16.382910 13009 status_update_manager.cpp:200] Recovering 
> status update manager
>     I0919 15:41:16.383419 13009 containerizer.cpp:522] Recovering 
> containerizer
>     I0919 15:41:16.392206 13004 provisioner.cpp:253] Provisioner recovery 
> complete
>     I0919 15:41:16.392354 13004 slave.cpp:4782] Finished recovery
>     I0919 15:41:16.405709 13004 detector.cpp:152] Detected a new leader: 
> (id='678')
>     I0919 15:41:16.406067 13005 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000678' in ZooKeeper
>     I0919 15:41:16.407572 13002 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:41:16.407977 13002 slave.cpp:895] New master detected at 
> master@10.142.55.190:5050
>     I0919 15:41:16.408043 13002 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:41:16.408140 13002 slave.cpp:927] Detecting new master
>     I0919 15:41:16.408223 13005 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:42:08.418956 13006 slave.cpp:3732] master@10.142.55.190:5050 
> exited
>     W0919 15:42:08.419035 13006 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:42:16.374977 13007 slave.cpp:4591] Current disk usage 72.41%. 
> Max allowed age: 1.231186482451933days
>     I0919 15:42:20.007169 13007 detector.cpp:152] Detected a new leader: 
> (id='679')
>     I0919 15:42:20.007297 13007 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000679' in ZooKeeper
>     I0919 15:42:20.008503 13007 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.196:5050) is detected
>     I0919 15:42:20.008587 13007 slave.cpp:895] New master detected at 
> master@10.142.55.196:5050
>     I0919 15:42:20.008610 13007 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:42:20.008703 13007 slave.cpp:927] Detecting new master
>     I0919 15:42:20.008750 13007 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:43:16.387984 13005 slave.cpp:4591] Current disk usage 72.41%. 
> Max allowed age: 1.231162010606794days
>     I0919 15:43:20.081272 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:43:20.081374 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:43:26.855154 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:43:26.855315 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:43:26.855159 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 12: Transport endpoint is not connected
>     I0919 15:43:32.020196 13002 detector.cpp:152] Detected a new leader: 
> (id='682')
>     I0919 15:43:32.020300 13002 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000682' in ZooKeeper
>     I0919 15:43:32.022203 13002 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.202:5050) is detected
>     I0919 15:43:32.022302 13002 slave.cpp:895] New master detected at 
> master@10.142.55.202:5050
>     I0919 15:43:32.022328 13002 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:43:32.022382 13002 slave.cpp:927] Detecting new master
>     I0919 15:43:32.022423 13002 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:44:16.389369 13003 slave.cpp:4591] Current disk usage 72.41%. 
> Max allowed age: 1.231119184877789days
>     I0919 15:44:32.535347 13003 slave.cpp:3732] master@10.142.55.202:5050 
> exited
>     W0919 15:44:32.535522 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:44:42.005375 13002 detector.cpp:152] Detected a new leader: 
> (id='684')
>     I0919 15:44:42.005496 13002 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000684' in ZooKeeper
>     I0919 15:44:42.006367 13002 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:44:42.006492 13002 slave.cpp:895] New master detected at 
> master@10.142.55.190:5050
>     I0919 15:44:42.006597 13002 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:44:42.006675 13002 slave.cpp:927] Detecting new master
>     I0919 15:44:42.006577 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:45:16.400794 13006 slave.cpp:4591] Current disk usage 72.48%. 
> Max allowed age: 1.226390000804074days
>     I0919 15:45:42.354790 13005 slave.cpp:3732] master@10.142.55.190:5050 
> exited
>     W0919 15:45:42.354857 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:45:54.020563 13002 detector.cpp:152] Detected a new leader: 
> (id='687')
>     I0919 15:45:54.020756 13002 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000687' in ZooKeeper
>     I0919 15:45:54.023296 13002 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.196:5050) is detected
>     I0919 15:45:54.023455 13002 slave.cpp:895] New master detected at 
> master@10.142.55.196:5050
>     I0919 15:45:54.023558 13002 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:45:54.023526 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:45:54.023669 13002 slave.cpp:927] Detecting new master
>     I0919 15:46:16.402402 13003 slave.cpp:4591] Current disk usage 72.53%. 
> Max allowed age: 1.223205601954942days
>     I0919 15:46:54.075505 13007 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:46:54.075592 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:46:56.098012 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:46:56.098016 13007 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:46:56.098253 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:46:56.462254 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:46:56.462260 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:46:56.462540 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:47:02.005637 13009 detector.cpp:152] Detected a new leader: 
> (id='688')
>     I0919 15:47:02.005765 13009 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000688' in ZooKeeper
>     I0919 15:47:02.006853 13009 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.202:5050) is detected
>     I0919 15:47:02.006959 13009 slave.cpp:895] New master detected at 
> master@10.142.55.202:5050
>     I0919 15:47:02.006986 13009 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:47:02.007025 13009 slave.cpp:927] Detecting new master
>     I0919 15:47:02.007061 13009 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:47:16.406669 13008 slave.cpp:4591] Current disk usage 72.53%. 
> Max allowed age: 1.223184189090440days
>     I0919 15:48:02.950891 13005 slave.cpp:3732] master@10.142.55.202:5050 
> exited
>     W0919 15:48:02.950994 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:48:12.006634 13005 detector.cpp:152] Detected a new leader: 
> (id='690')
>     I0919 15:48:12.006817 13003 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000690' in ZooKeeper
>     I0919 15:48:12.007987 13003 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:48:12.008126 13003 slave.cpp:895] New master detected at 
> master@10.142.55.190:5050
>     I0919 15:48:12.008210 13003 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:48:12.008280 13003 slave.cpp:927] Detecting new master
>     I0919 15:48:12.008191 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:48:16.409266 13003 slave.cpp:4591] Current disk usage 72.54%. 
> Max allowed age: 1.222480623542604days
>     I0919 15:49:12.379010 13009 slave.cpp:3732] master@10.142.55.190:5050 
> exited
>     W0919 15:49:12.379149 13009 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:49:12.379233 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 12: Transport endpoint is not connected
>     I0919 15:49:16.413767 13007 slave.cpp:4591] Current disk usage 72.64%. 
> Max allowed age: 1.215032005677465days
>     I0919 15:49:24.016290 13007 detector.cpp:152] Detected a new leader: 
> (id='693')
>     I0919 15:49:24.016417 13007 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000693' in ZooKeeper
>     I0919 15:49:24.018273 13007 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.196:5050) is detected
>     I0919 15:49:24.018437 13007 slave.cpp:895] New master detected at 
> master@10.142.55.196:5050
>     I0919 15:49:24.018523 13007 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:49:24.018604 13007 slave.cpp:927] Detecting new master
>     I0919 15:49:24.018496 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:50:16.416391 13008 slave.cpp:4591] Current disk usage 72.64%. 
> Max allowed age: 1.215016710774248days
>     I0919 15:50:24.065268 13003 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:50:24.065342 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:50:24.485752 13004 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:50:24.485839 13004 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:24.485977 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:50:28.343647 13003 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:50:28.343719 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:28.343819 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:50:31.545099 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:50:31.545171 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:31.545284 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:50:32.007096 13008 detector.cpp:152] Detected a new leader: 
> (id='694')
>     I0919 15:50:32.007195 13008 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000694' in ZooKeeper
>     I0919 15:50:32.009881 13008 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.202:5050) is detected
>     I0919 15:50:32.009970 13008 slave.cpp:895] New master detected at 
> master@10.142.55.202:5050
>     I0919 15:50:32.009994 13008 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:50:32.010030 13008 slave.cpp:927] Detecting new master
>     I0919 15:50:32.010079 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:51:16.417846 13006 slave.cpp:4591] Current disk usage 72.64%. 
> Max allowed age: 1.214964708103322days
>     I0919 15:51:32.560317 13003 slave.cpp:3732] master@10.142.55.202:5050 
> exited
>     W0919 15:51:32.560410 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:51:42.005147 13009 detector.cpp:152] Detected a new leader: 
> (id='696')
>     I0919 15:51:42.005265 13009 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000696' in ZooKeeper
>     I0919 15:51:42.006824 13009 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:51:42.006904 13009 slave.cpp:895] New master detected at 
> master@10.142.55.190:5050
>     I0919 15:51:42.006928 13009 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:51:42.006963 13009 slave.cpp:927] Detecting new master
>     I0919 15:51:42.006999 13009 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:52:16.419373 13003 slave.cpp:4591] Current disk usage 72.71%. 
> Max allowed age: 1.209981628636250days
>     I0919 15:52:42.336305 13002 slave.cpp:3732] master@10.142.55.190:5050 
> exited
>     W0919 15:52:42.336426 13002 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:52:54.005267 13005 detector.cpp:152] Detected a new leader: 
> (id='699')
>     I0919 15:52:54.005408 13005 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000699' in ZooKeeper
>     I0919 15:52:54.006206 13005 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.196:5050) is detected
>     I0919 15:52:54.006285 13005 slave.cpp:895] New master detected at 
> master@10.142.55.196:5050
>     I0919 15:52:54.006309 13005 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:52:54.006398 13005 slave.cpp:927] Detecting new master
>     I0919 15:52:54.006451 13005 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:53:16.420258 13005 slave.cpp:4591] Current disk usage 72.76%. 
> Max allowed age: 1.206748286096840days
>     I0919 15:53:54.071012 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:53:54.071143 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:54:01.105780 13002 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:54:01.105854 13002 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:54:01.105970 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     I0919 15:54:05.733837 13007 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:54:05.733932 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:54:05.734071 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     E0919 15:54:05.818560 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     I0919 15:54:05.818583 13003 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:54:05.818758 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:54:06.004385 13009 detector.cpp:152] Detected a new leader: 
> (id='700')
>     I0919 15:54:06.004494 13009 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000700' in ZooKeeper
>     I0919 15:54:06.005511 13009 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.202:5050) is detected
>     I0919 15:54:06.005586 13009 slave.cpp:895] New master detected at 
> master@10.142.55.202:5050
>     I0919 15:54:06.005609 13009 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:54:06.005676 13009 slave.cpp:927] Detecting new master
>     I0919 15:54:06.005720 13009 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:54:16.423193 13002 slave.cpp:4591] Current disk usage 72.76%. 
> Max allowed age: 1.206699342406551days
> {code}
> slave warn log
> {code}
>     Log file created at: 2016/09/19 15:42:08
>     Running on machine: ubuntu12
>     Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>     W0919 15:42:08.419035 13006 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:43:20.081374 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:43:26.855315 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:43:26.855159 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 12: Transport endpoint is not connected
>     W0919 15:44:32.535522 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:45:42.354857 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:46:54.075592 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:46:56.098012 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:46:56.098253 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:46:56.462254 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:46:56.462540 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:48:02.950994 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:49:12.379149 13009 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:49:12.379233 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 12: Transport endpoint is not connected
>     W0919 15:50:24.065342 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:50:24.485839 13004 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:24.485977 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:50:28.343719 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:28.343819 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:50:31.545171 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:31.545284 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:51:32.560410 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:52:42.336426 13002 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:53:54.071143 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:54:01.105854 13002 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:54:01.105970 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     W0919 15:54:05.733932 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:54:05.734071 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     E0919 15:54:05.818560 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     W0919 15:54:05.818758 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:55:06.821486 13009 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to