[ 
https://issues.apache.org/jira/browse/MESOS-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504119#comment-15504119
 ] 

Joseph Wu commented on MESOS-6205:
----------------------------------

There are two repeating log messages that tell you (indirectly) that something 
is wrong:
{code}
I0919 15:55:08.178272 13280 replica.cpp:673] Replica in VOTING status received 
a broadcasted recover request from (14)@10.142.55.202:5050
{code}
This message means that you've started this master before, with the same work 
directory.  It has some sort of persistent state in its work directory.

This log message tells you that there are two masters you have *not* started 
before:
{code}
I0919 15:55:16.018023 13282 consensus.cpp:360] Aborting implicit promise 
request because 2 ignores received
{code}

The masters will refuse to start because there is less than a quorum of masters 
with the persistent state.  If the masters were to start, you would have 
potential data loss.  This is the expected behavior, as Mesos errs on the side 
of caution.  

I'm assuming you want a fresh cluster (no prior state); you can fix this by 
deleting the work directory of the master on the {{10.142.55.202}} node.  If 
none of the masters have any prior state, they will reach consensus.

> mesos-master can not found mesos-slave, and elect a new leader in a short 
> interval
> ----------------------------------------------------------------------------------
>
>                 Key: MESOS-6205
>                 URL: https://issues.apache.org/jira/browse/MESOS-6205
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: ubuntu 12 x64, centos 6.5 x64, centos 7.2 x64
>            Reporter: kasim
>
> I follow this 
> [doc][https://open.mesosphere.com/getting-started/install/#verifying-installation]
>  to setup mesos cluster.
> There are three vm(ubuntu 12, centos 6.5, centos 7.2).
>     $ cat /etc/hosts
>     10.142.55.190 zk1
>     10.142.55.196 zk2
>     10.142.55.202 zk3
> config in each mathine:
>     $ cat /etc/mesos/zk
>     zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos
> ----------------------------
> After start zookeeper, mesos-master and mesos-slave in three vm, I can view 
> the mesos webui(10.142.55.190:5050), but agents count is 0.
> After a little time, mesos page get error:
>     Failed to connect to 10.142.55.190:5050!
>     Retrying in 16 seconds... 
> (I found that zookeeper would elect a new leader in a short interval)
> ----------------------------------------
> mesos-master cmd:
> ```
> mesos-master --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="false" 
> --authenticate_frameworks="false" --authenticate_http_frameworks="false" 
> --authenticate_http_readonly="false" --authenticate_http_readwrite="false" 
> --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --ip="10.142.55.190" 
> --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
> --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --port="5050" --quiet="false" --quorum="2" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="20secs" 
> --registry_strict="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/share/mesos/webui" 
> --work_dir="/var/lib/mesos" 
> --zk="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos"
> ```
> mesos-slave cmd:
> ```
> mesos-slave --appc_simple_discovery_uri_prefix="http://"; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
> --docker="docker" --docker_kill_orphans="true" 
> --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
> --docker_store_dir="/tmp/mesos/store/docker" 
> --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" 
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" 
> --hadoop_home="" --help="false" --hostname="10.142.55.190" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_command_executor="false" --image_provisioner_backend="copy" 
> --initialize_driver_logging="true" --ip="10.142.55.190" 
> --isolation="posix/cpu,posix/mem" --launcher="posix" 
> --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" 
> --logbufsecs="0" --logging_level="INFO" 
> --master="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos"
>  --oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
> --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
> --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
> --systemd_enable_support="true" 
> --systemd_runtime_directory="/run/systemd/system" --version="false" 
> --work_dir="/var/lib/mesos"
> ```
> When I run mesos-master from command-line, I got 
> ```
> I0919 17:20:19.286264 17550 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (583)@10.142.55.202:5050
> F0919 17:20:20.009371 17556 master.cpp:1536] Recovery failed: Failed to 
> recover registrar: Failed to perform fetch within 1mins
> *** Check failure stack trace: ***
>     @     0x7f9db78458dd  google::LogMessage::Fail()
>     @     0x7f9db784771d  google::LogMessage::SendToLog()
>     @     0x7f9db78454cc  google::LogMessage::Flush()
>     @     0x7f9db7848019  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f9db6e2dbbc  mesos::internal::master::fail()
>     @     0x7f9db6e75b20  
> _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EE
>             EEvEERKS6_OT_NS6_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_
>     @           0x42a116  process::Future<>::fail()
>     @     0x7f9db6e9f705  process::internal::thenf<>()
>     @     0x7f9db6efd016  
> _ZN7process8internal3runISt8functionIFvRKNS_6FutureIN5mesos8internal8RegistryEEEEEJRS7_EEEvRKSt6vectorIT_SaISE_EEDp
>             OT0_
> I0919 17:20:20.025172 17553 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (212)@10.142.55.196:5050
>     @     0x7f9db6f100de  process::Future<>::fail()
>     @     0x7f9db6c57e86  process::internal::run<>()
>     @     0x7f9db6f100cb  process::Future<>::fail()
>     @     0x7f9db6ef2d34  
> mesos::internal::master::RegistrarProcess::_recover()
>     @     0x7f9db77d5171  process::ProcessManager::resume()
>     @     0x7f9db77d5477  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
>     @     0x7f9db5e439c0  (unknown)
>     @     0x7f9db568ae9a  start_thread
>     @     0x7f9db53b836d  (unknown)
> [1]    17548 abort (core dumped)  mesos-master --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins
> ```
> it seems mesos-master quit by failure, so zookeeper restart it and elect a 
> new leader???
> ---------------------------------------------------------
> master info log:
>     I0919 15:54:59.677438 13281 http.cpp:2022] Redirecting request for 
> /master/state?jsonp=angular.callbacks._1x to the leading master zk3
>     I0919 15:55:00.098667 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (768)@10.142.55.202:5050
>     I0919 15:55:00.385279 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (185)@10.142.55.196:5050
>     I0919 15:55:00.711119 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (771)@10.142.55.202:5050
>     I0919 15:55:01.347291 13284 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (188)@10.142.55.196:5050
>     I0919 15:55:01.597682 13284 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (774)@10.142.55.202:5050
>     I0919 15:55:02.257159 13282 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (191)@10.142.55.196:5050
>     I0919 15:55:02.370692 13287 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (777)@10.142.55.202:5050
>     I0919 15:55:03.205920 13285 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (780)@10.142.55.202:5050
>     I0919 15:55:03.260007 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (194)@10.142.55.196:5050
>     I0919 15:55:03.929611 13283 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (783)@10.142.55.202:5050
>     I0919 15:55:04.033308 13287 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (197)@10.142.55.196:5050
>     I0919 15:55:04.591275 13284 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (200)@10.142.55.196:5050
>     I0919 15:55:04.608211 13283 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (786)@10.142.55.202:5050
>     I0919 15:55:05.184682 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (789)@10.142.55.202:5050
>     I0919 15:55:05.268277 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (203)@10.142.55.196:5050
>     I0919 15:55:05.775377 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (206)@10.142.55.196:5050
>     I0919 15:55:05.916445 13285 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (792)@10.142.55.202:5050
>     I0919 15:55:06.744927 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (209)@10.142.55.196:5050
>     I0919 15:55:07.378521 13283 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (5)@10.142.55.202:5050
>     I0919 15:55:07.393311 13285 network.hpp:430] ZooKeeper group memberships 
> changed
>     I0919 15:55:07.393427 13285 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000709' in ZooKeeper
>     I0919 15:55:07.393985 13285 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000711' in ZooKeeper
>     I0919 15:55:07.394394 13285 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000714' in ZooKeeper
>     I0919 15:55:07.394843 13285 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000715' in ZooKeeper
>     I0919 15:55:07.395418 13285 network.hpp:478] ZooKeeper group PIDs: { 
> log-replica(1)@10.142.55.190:5050, log-replica(1)@10.142.55.196:5050, 
> log-replica(1)@10.142.55.202:5050 }
>     I0919 15:55:08.178272 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (14)@10.142.55.202:5050
>     I0919 15:55:09.059562 13282 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (21)@10.142.55.202:5050
>     I0919 15:55:09.700711 13286 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (24)@10.142.55.202:5050
>     I0919 15:55:09.742185 13287 http.cpp:381] HTTP GET for /master/state from 
> 10.142.50.94:59987 with User-Agent='Mozilla/5.0 (Windows NT 6.2; WOW64; 
> rv:47.0) Gecko/20100101 Firefox/47.0'
>     I0919 15:55:09.742359 13287 http.cpp:2022] Redirecting request for 
> /master/state?jsonp=angular.callbacks._1y to the leading master zk3
>     I0919 15:55:10.660789 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (30)@10.142.55.202:5050
>     I0919 15:55:11.480326 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (34)@10.142.55.202:5050
>     I0919 15:55:12.386256 13286 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (37)@10.142.55.202:5050
>     I0919 15:55:12.975137 13287 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (42)@10.142.55.202:5050
>     I0919 15:55:13.843091 13285 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (47)@10.142.55.202:5050
>     I0919 15:55:14.373478 13281 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (51)@10.142.55.202:5050
>     I0919 15:55:14.937181 13280 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (54)@10.142.55.202:5050
>     I0919 15:55:15.658219 13283 replica.cpp:673] Replica in VOTING status 
> received a broadcasted recover request from (58)@10.142.55.202:5050
>     I0919 15:55:16.007822 13286 network.hpp:430] ZooKeeper group memberships 
> changed
>     I0919 15:55:16.007972 13286 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000711' in ZooKeeper
>     I0919 15:55:16.010170 13286 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000714' in ZooKeeper
>     I0919 15:55:16.011462 13284 detector.cpp:152] Detected a new leader: 
> (id='702')
>     I0919 15:55:16.011556 13284 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000702' in ZooKeeper
>     I0919 15:55:16.011968 13286 group.cpp:706] Trying to get 
> '/mesos/log_replicas/0000000715' in ZooKeeper
>     I0919 15:55:16.012526 13286 network.hpp:478] ZooKeeper group PIDs: { 
> log-replica(1)@10.142.55.190:5050, log-replica(1)@10.142.55.196:5050, 
> log-replica(1)@10.142.55.202:5050 }
>     I0919 15:55:16.013156 13284 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:55:16.013222 13284 master.cpp:1847] The newly elected leader is 
> master@10.142.55.190:5050 with id 677967bc-f6f0-46b3-a44e-72eed1befd60
>     I0919 15:55:16.013244 13284 master.cpp:1860] Elected as the leading 
> master!
>     I0919 15:55:16.013273 13284 master.cpp:1547] Recovering from registrar
>     I0919 15:55:16.013352 13284 registrar.cpp:332] Recovering registrar
>     I0919 15:55:16.014081 13280 log.cpp:553] Attempting to start the writer
>     I0919 15:55:16.014515 13280 replica.cpp:493] Replica received implicit 
> promise request from (211)@10.142.55.190:5050 with proposal 1204590
>     I0919 15:55:16.018023 13282 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.018028 13280 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 3.469479ms
>     I0919 15:55:16.018338 13280 replica.cpp:342] Persisted promised to 1204590
>     I0919 15:55:16.018508 13282 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.018645 13282 log.cpp:553] Attempting to start the writer
>     I0919 15:55:16.018899 13282 replica.cpp:493] Replica received implicit 
> promise request from (215)@10.142.55.190:5050 with proposal 1204591
>     I0919 15:55:16.022183 13287 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.022367 13280 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.022510 13280 log.cpp:553] Attempting to start the writer
>     I0919 15:55:16.028880 13282 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 9.870818ms
>     I0919 15:55:16.029024 13282 replica.cpp:342] Persisted promised to 1204591
>     I0919 15:55:16.029428 13286 replica.cpp:493] Replica received implicit 
> promise request from (219)@10.142.55.190:5050 with proposal 1204592
>     I0919 15:55:16.031600 13280 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.036208 13283 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.036454 13283 log.cpp:553] Attempting to start the writer
>     I0919 15:55:16.040256 13286 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 10.783237ms
>     I0919 15:55:16.040339 13286 replica.cpp:342] Persisted promised to 1204592
>     I0919 15:55:16.040712 13286 replica.cpp:493] Replica received implicit 
> promise request from (222)@10.142.55.190:5050 with proposal 1204593
>     I0919 15:55:16.042196 13286 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 1.435071ms
>     I0919 15:55:16.042250 13286 replica.cpp:342] Persisted promised to 1204593
>     I0919 15:55:16.042981 13286 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.043099 13286 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.043303 13283 log.cpp:553] Attempting to start the writer
> All later logs are looping :
>     I0919 15:55:16.043676 13286 replica.cpp:493] Replica received implicit 
> promise request from (225)@10.142.55.190:5050 with proposal 1204594
>     I0919 15:55:16.044122 13286 leveldb.cpp:304] Persisting metadata (10 
> bytes) to leveldb took 404769ns
>     I0919 15:55:16.044209 13286 replica.cpp:342] Persisted promised to 1204594
>     I0919 15:55:16.044837 13281 consensus.cpp:360] Aborting implicit promise 
> request because 2 ignores received
>     I0919 15:55:16.044926 13281 log.cpp:565] Could not start the writer, but 
> can be retried
>     I0919 15:55:16.045038 13281 log.cpp:553] Attempting to start the writer
> slave info log:
>     Log file created at: 2016/09/19 15:41:16
>     Running on machine: ubuntu12
>     Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>     I0919 15:41:16.346844 12986 logging.cpp:194] INFO level logging started!
>     I0919 15:41:16.363313 12986 containerizer.cpp:196] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
>     I0919 15:41:16.370334 12986 main.cpp:434] Starting Mesos agent
>     I0919 15:41:16.371184 12986 slave.cpp:198] Agent started on 
> 1)@127.0.1.1:5051
>     I0919 15:41:16.371636 12986 slave.cpp:199] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://"; 
> --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
> --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
> --cgroups_root="mesos" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
> --docker="docker" --docker_kill_orphans="true" 
> --docker_registry="https://registry-1.docker.io"; --docker_remove_delay="6hrs" 
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
> --docker_store_dir="/tmp/mesos/store/docker" 
> --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" 
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" 
> --hadoop_home="" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_command_executor="false" 
> --image_provisioner_backend="copy" --initialize_driver_logging="true" 
> --isolation="posix/cpu,posix/mem" --launcher="posix" 
> --launcher_dir="/usr/libexec/mesos" --log_dir="/var/log/mesos" 
> --logbufsecs="0" --logging_level="INFO" 
> --master="zk://10.142.55.190:2181,10.142.55.196:2181,10.142.55.202:2181/mesos"
>  --oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
> --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" 
> --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
> --systemd_enable_support="true" 
> --systemd_runtime_directory="/run/systemd/system" --version="false" 
> --work_dir="/var/lib/mesos"
>     I0919 15:41:16.373072 12986 slave.cpp:519] Agent resources: cpus(*):2; 
> mem(*):2930; disk(*):4469; ports(*):[31000-32000]
>     I0919 15:41:16.373291 12986 slave.cpp:527] Agent attributes: [  ]
>     I0919 15:41:16.373347 12986 slave.cpp:532] Agent hostname: ubuntu12
>     I0919 15:41:16.379895 13005 state.cpp:57] Recovering state from 
> '/var/lib/mesos/meta'
>     I0919 15:41:16.382519 13005 group.cpp:349] Group process 
> (group(1)@127.0.1.1:5051) connected to ZooKeeper
>     I0919 15:41:16.382593 13005 group.cpp:837] Syncing group operations: 
> queue size (joins, cancels, datas) = (0, 0, 0)
>     I0919 15:41:16.382663 13005 group.cpp:427] Trying to create path '/mesos' 
> in ZooKeeper
>     I0919 15:41:16.382910 13009 status_update_manager.cpp:200] Recovering 
> status update manager
>     I0919 15:41:16.383419 13009 containerizer.cpp:522] Recovering 
> containerizer
>     I0919 15:41:16.392206 13004 provisioner.cpp:253] Provisioner recovery 
> complete
>     I0919 15:41:16.392354 13004 slave.cpp:4782] Finished recovery
>     I0919 15:41:16.405709 13004 detector.cpp:152] Detected a new leader: 
> (id='678')
>     I0919 15:41:16.406067 13005 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000678' in ZooKeeper
>     I0919 15:41:16.407572 13002 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:41:16.407977 13002 slave.cpp:895] New master detected at 
> master@10.142.55.190:5050
>     I0919 15:41:16.408043 13002 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:41:16.408140 13002 slave.cpp:927] Detecting new master
>     I0919 15:41:16.408223 13005 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:42:08.418956 13006 slave.cpp:3732] master@10.142.55.190:5050 
> exited
>     W0919 15:42:08.419035 13006 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:42:16.374977 13007 slave.cpp:4591] Current disk usage 72.41%. 
> Max allowed age: 1.231186482451933days
>     I0919 15:42:20.007169 13007 detector.cpp:152] Detected a new leader: 
> (id='679')
>     I0919 15:42:20.007297 13007 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000679' in ZooKeeper
>     I0919 15:42:20.008503 13007 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.196:5050) is detected
>     I0919 15:42:20.008587 13007 slave.cpp:895] New master detected at 
> master@10.142.55.196:5050
>     I0919 15:42:20.008610 13007 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:42:20.008703 13007 slave.cpp:927] Detecting new master
>     I0919 15:42:20.008750 13007 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:43:16.387984 13005 slave.cpp:4591] Current disk usage 72.41%. 
> Max allowed age: 1.231162010606794days
>     I0919 15:43:20.081272 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:43:20.081374 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:43:26.855154 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:43:26.855315 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:43:26.855159 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 12: Transport endpoint is not connected
>     I0919 15:43:32.020196 13002 detector.cpp:152] Detected a new leader: 
> (id='682')
>     I0919 15:43:32.020300 13002 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000682' in ZooKeeper
>     I0919 15:43:32.022203 13002 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.202:5050) is detected
>     I0919 15:43:32.022302 13002 slave.cpp:895] New master detected at 
> master@10.142.55.202:5050
>     I0919 15:43:32.022328 13002 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:43:32.022382 13002 slave.cpp:927] Detecting new master
>     I0919 15:43:32.022423 13002 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:44:16.389369 13003 slave.cpp:4591] Current disk usage 72.41%. 
> Max allowed age: 1.231119184877789days
>     I0919 15:44:32.535347 13003 slave.cpp:3732] master@10.142.55.202:5050 
> exited
>     W0919 15:44:32.535522 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:44:42.005375 13002 detector.cpp:152] Detected a new leader: 
> (id='684')
>     I0919 15:44:42.005496 13002 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000684' in ZooKeeper
>     I0919 15:44:42.006367 13002 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:44:42.006492 13002 slave.cpp:895] New master detected at 
> master@10.142.55.190:5050
>     I0919 15:44:42.006597 13002 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:44:42.006675 13002 slave.cpp:927] Detecting new master
>     I0919 15:44:42.006577 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:45:16.400794 13006 slave.cpp:4591] Current disk usage 72.48%. 
> Max allowed age: 1.226390000804074days
>     I0919 15:45:42.354790 13005 slave.cpp:3732] master@10.142.55.190:5050 
> exited
>     W0919 15:45:42.354857 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:45:54.020563 13002 detector.cpp:152] Detected a new leader: 
> (id='687')
>     I0919 15:45:54.020756 13002 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000687' in ZooKeeper
>     I0919 15:45:54.023296 13002 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.196:5050) is detected
>     I0919 15:45:54.023455 13002 slave.cpp:895] New master detected at 
> master@10.142.55.196:5050
>     I0919 15:45:54.023558 13002 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:45:54.023526 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:45:54.023669 13002 slave.cpp:927] Detecting new master
>     I0919 15:46:16.402402 13003 slave.cpp:4591] Current disk usage 72.53%. 
> Max allowed age: 1.223205601954942days
>     I0919 15:46:54.075505 13007 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:46:54.075592 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:46:56.098012 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:46:56.098016 13007 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:46:56.098253 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:46:56.462254 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:46:56.462260 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:46:56.462540 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:47:02.005637 13009 detector.cpp:152] Detected a new leader: 
> (id='688')
>     I0919 15:47:02.005765 13009 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000688' in ZooKeeper
>     I0919 15:47:02.006853 13009 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.202:5050) is detected
>     I0919 15:47:02.006959 13009 slave.cpp:895] New master detected at 
> master@10.142.55.202:5050
>     I0919 15:47:02.006986 13009 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:47:02.007025 13009 slave.cpp:927] Detecting new master
>     I0919 15:47:02.007061 13009 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:47:16.406669 13008 slave.cpp:4591] Current disk usage 72.53%. 
> Max allowed age: 1.223184189090440days
>     I0919 15:48:02.950891 13005 slave.cpp:3732] master@10.142.55.202:5050 
> exited
>     W0919 15:48:02.950994 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:48:12.006634 13005 detector.cpp:152] Detected a new leader: 
> (id='690')
>     I0919 15:48:12.006817 13003 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000690' in ZooKeeper
>     I0919 15:48:12.007987 13003 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:48:12.008126 13003 slave.cpp:895] New master detected at 
> master@10.142.55.190:5050
>     I0919 15:48:12.008210 13003 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:48:12.008280 13003 slave.cpp:927] Detecting new master
>     I0919 15:48:12.008191 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:48:16.409266 13003 slave.cpp:4591] Current disk usage 72.54%. 
> Max allowed age: 1.222480623542604days
>     I0919 15:49:12.379010 13009 slave.cpp:3732] master@10.142.55.190:5050 
> exited
>     W0919 15:49:12.379149 13009 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:49:12.379233 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 12: Transport endpoint is not connected
>     I0919 15:49:16.413767 13007 slave.cpp:4591] Current disk usage 72.64%. 
> Max allowed age: 1.215032005677465days
>     I0919 15:49:24.016290 13007 detector.cpp:152] Detected a new leader: 
> (id='693')
>     I0919 15:49:24.016417 13007 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000693' in ZooKeeper
>     I0919 15:49:24.018273 13007 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.196:5050) is detected
>     I0919 15:49:24.018437 13007 slave.cpp:895] New master detected at 
> master@10.142.55.196:5050
>     I0919 15:49:24.018523 13007 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:49:24.018604 13007 slave.cpp:927] Detecting new master
>     I0919 15:49:24.018496 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:50:16.416391 13008 slave.cpp:4591] Current disk usage 72.64%. 
> Max allowed age: 1.215016710774248days
>     I0919 15:50:24.065268 13003 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:50:24.065342 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:50:24.485752 13004 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:50:24.485839 13004 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:24.485977 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:50:28.343647 13003 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:50:28.343719 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:28.343819 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:50:31.545099 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:50:31.545171 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:31.545284 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     I0919 15:50:32.007096 13008 detector.cpp:152] Detected a new leader: 
> (id='694')
>     I0919 15:50:32.007195 13008 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000694' in ZooKeeper
>     I0919 15:50:32.009881 13008 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.202:5050) is detected
>     I0919 15:50:32.009970 13008 slave.cpp:895] New master detected at 
> master@10.142.55.202:5050
>     I0919 15:50:32.009994 13008 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:50:32.010030 13008 slave.cpp:927] Detecting new master
>     I0919 15:50:32.010079 13008 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:51:16.417846 13006 slave.cpp:4591] Current disk usage 72.64%. 
> Max allowed age: 1.214964708103322days
>     I0919 15:51:32.560317 13003 slave.cpp:3732] master@10.142.55.202:5050 
> exited
>     W0919 15:51:32.560410 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:51:42.005147 13009 detector.cpp:152] Detected a new leader: 
> (id='696')
>     I0919 15:51:42.005265 13009 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000696' in ZooKeeper
>     I0919 15:51:42.006824 13009 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.190:5050) is detected
>     I0919 15:51:42.006904 13009 slave.cpp:895] New master detected at 
> master@10.142.55.190:5050
>     I0919 15:51:42.006928 13009 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:51:42.006963 13009 slave.cpp:927] Detecting new master
>     I0919 15:51:42.006999 13009 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:52:16.419373 13003 slave.cpp:4591] Current disk usage 72.71%. 
> Max allowed age: 1.209981628636250days
>     I0919 15:52:42.336305 13002 slave.cpp:3732] master@10.142.55.190:5050 
> exited
>     W0919 15:52:42.336426 13002 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:52:54.005267 13005 detector.cpp:152] Detected a new leader: 
> (id='699')
>     I0919 15:52:54.005408 13005 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000699' in ZooKeeper
>     I0919 15:52:54.006206 13005 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.196:5050) is detected
>     I0919 15:52:54.006285 13005 slave.cpp:895] New master detected at 
> master@10.142.55.196:5050
>     I0919 15:52:54.006309 13005 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:52:54.006398 13005 slave.cpp:927] Detecting new master
>     I0919 15:52:54.006451 13005 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:53:16.420258 13005 slave.cpp:4591] Current disk usage 72.76%. 
> Max allowed age: 1.206748286096840days
>     I0919 15:53:54.071012 13005 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:53:54.071143 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:54:01.105780 13002 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:54:01.105854 13002 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:54:01.105970 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     I0919 15:54:05.733837 13007 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:54:05.733932 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:54:05.734071 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     E0919 15:54:05.818560 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     I0919 15:54:05.818583 13003 slave.cpp:3732] master@10.142.55.196:5050 
> exited
>     W0919 15:54:05.818758 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     I0919 15:54:06.004385 13009 detector.cpp:152] Detected a new leader: 
> (id='700')
>     I0919 15:54:06.004494 13009 group.cpp:706] Trying to get 
> '/mesos/json.info_0000000700' in ZooKeeper
>     I0919 15:54:06.005511 13009 zookeeper.cpp:259] A new leading master 
> (UPID=master@10.142.55.202:5050) is detected
>     I0919 15:54:06.005586 13009 slave.cpp:895] New master detected at 
> master@10.142.55.202:5050
>     I0919 15:54:06.005609 13009 slave.cpp:916] No credentials provided. 
> Attempting to register without authentication
>     I0919 15:54:06.005676 13009 slave.cpp:927] Detecting new master
>     I0919 15:54:06.005720 13009 status_update_manager.cpp:174] Pausing 
> sending status updates
>     I0919 15:54:16.423193 13002 slave.cpp:4591] Current disk usage 72.76%. 
> Max allowed age: 1.206699342406551days
> slave warn log
>     Log file created at: 2016/09/19 15:42:08
>     Running on machine: ubuntu12
>     Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>     W0919 15:42:08.419035 13006 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:43:20.081374 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:43:26.855315 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:43:26.855159 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 12: Transport endpoint is not connected
>     W0919 15:44:32.535522 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:45:42.354857 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:46:54.075592 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:46:56.098012 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:46:56.098253 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:46:56.462254 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:46:56.462540 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:48:02.950994 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:49:12.379149 13009 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:49:12.379233 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 12: Transport endpoint is not connected
>     W0919 15:50:24.065342 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:50:24.485839 13004 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:24.485977 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:50:28.343719 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:28.343819 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:50:31.545171 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:50:31.545284 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 14: Transport endpoint is not connected
>     W0919 15:51:32.560410 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:52:42.336426 13002 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:53:54.071143 13005 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:54:01.105854 13002 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:54:01.105970 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     W0919 15:54:05.733932 13007 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     E0919 15:54:05.734071 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     E0919 15:54:05.818560 13010 process.cpp:2105] Failed to shutdown socket 
> with fd 15: Transport endpoint is not connected
>     W0919 15:54:05.818758 13003 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected
>     W0919 15:55:06.821486 13009 slave.cpp:3737] Master disconnected! Waiting 
> for a new master to be elected



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to