[ 
https://issues.apache.org/jira/browse/MESOS-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Kolloch updated MESOS-3744:
---------------------------------
    Description: 
The crash happened shortly after calling teardown. The teardown was initiated 
by using httpie with:

http -f -v POST "$MASTER_BASE_URL/teardown" "frameworkId=$FRAMEWORK"

Below you will find the master-fail.log over the relevant time interval. Here 
are the last log lines before the mesos master died:

Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
F1015 13:13:21.511503 23038 sorter.cpp:213] Check failed: 
total.resources.contains(slaveId)
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
*** Check failure stack trace: ***
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1860169fd  google::LogMessage::Fail()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd18601889d  google::LogMessage::SendToLog()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1860165ec  google::LogMessage::Flush()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1860191be  google::LogMessageFatal::~LogMessageFatal()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd186af3ea0  mesos::internal::master::allocator::DRFSorter::remove()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1869d6dec  
mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd186fbdab9  process::ProcessManager::resume()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd186fbddaf  process::schedule()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1852bc66c  (unknown)
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd184fff2ed  (unknown)

I am not sure if it matters but in this case multiple framework instances 
registered with the same framework name.

Here is an excerpt of the startup of the effected mesos master version because 
it does contain the software versions in use:

Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.454946 18936 logging.cpp:172] INFO level logging started!
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455173 18936 main.cpp:181] Build: 2015-09-28 19:50:01 by
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455199 18936 main.cpp:183] Version: 0.23.0
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455215 18936 main.cpp:190] Git SHA: 
7d15294f46b5062c59818f4d062044ac04349dc1
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455294 18936 main.cpp:204] Using 'HierarchicalDRF' allocator
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.016752 18936 leveldb.cpp:176] Opened db in 561.344642ms
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158462 18936 leveldb.cpp:183] Compacted db in 141.288563ms
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158534 18936 leveldb.cpp:198] Created db iterator in 13783ns
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158572 18936 leveldb.cpp:204] Seeked to beginning of db in 
10366ns
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158673 18936 leveldb.cpp:273] Iterated through 3 keys in the db 
in 78606ns
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158733 18936 replica.cpp:744] Replica recovered with log 
positions 125 -> 126 with 0 holes and 0 unlearned
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@723: Client 
environment:os.name=Linux
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@724: Client 
environment:os.arch=4.0.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@725: Client 
environment:os.version=#2 SMP Fri Jul 10 01:01:50 UTC 2015
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@733: Client 
environment:user.name=(null)
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@741: Client 
environment:user.home=/root
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@753: Client 
environment:user.dir=/
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@zookeeper_init@786: 
Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 
watcher=0x7f0532095480 sessionId=0 sessionPasswd=<null> context=0x7f0504001130 
flags=0
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.160876 18936 main.cpp:383] Starting Mesos master
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f0528cd3700):ZOO_INFO@check_events@1703: 
initiated connection to server [127.0.0.1:2181]
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161655 18936 master.cpp:368] Master 
20151015-131338-3674472458-5050-18936 (10.0.4.219) started on 10.0.4.219:5050
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161357 18942 log.cpp:238] Attempting to join replica to 
ZooKeeper group
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,162:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162201 18936 master.cpp:370] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--cluster="peter-p70wxd2" --framework_sorter="drf" --help="false" 
--hostname="10.0.4.219" --initialize_driver_logging="true" --ip="10.0.4.219" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" 
--quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="5secs" --registry_strict="false" 
--roles="slave_public" --root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/opt/mesosphere/packages/mesos--d43a8eb9946a5c1c5ec05fb21922a2fdf41775b2/share/mesos/webui"
 --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
--zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162433 18936 master.cpp:417] Master allowing unauthenticated 
frameworks to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162454 18936 master.cpp:422] Master allowing unauthenticated 
slaves to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162480 18936 master.cpp:459] Using default 'crammd5' 
authenticator

  was:
The crash happened shortly after calling teardown. The teardown was initiated 
by using httpie with:

http -f -v POST "$MASTER_BASE_URL/teardown" "frameworkId=$FRAMEWORK"

Below you will find the master-fail.log over the relevant time interval. Here 
are the last log lines before the mesos master died:

Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
F1015 13:13:21.511503 23038 sorter.cpp:213] Check failed: 
total.resources.contains(slaveId)
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
*** Check failure stack trace: ***
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1860169fd  google::LogMessage::Fail()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd18601889d  google::LogMessage::SendToLog()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1860165ec  google::LogMessage::Flush()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1860191be  google::LogMessageFatal::~LogMessageFatal()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd186af3ea0  mesos::internal::master::allocator::DRFSorter::remove()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1869d6dec  
mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd186fbdab9  process::ProcessManager::resume()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd186fbddaf  process::schedule()
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd1852bc66c  (unknown)
Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: @ 
    0x7fd184fff2ed  (unknown)

Here is an excerpt of the startup of the effected mesos master version because 
it does contain the software versions in use:

Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.454946 18936 logging.cpp:172] INFO level logging started!
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455173 18936 main.cpp:181] Build: 2015-09-28 19:50:01 by
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455199 18936 main.cpp:183] Version: 0.23.0
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455215 18936 main.cpp:190] Git SHA: 
7d15294f46b5062c59818f4d062044ac04349dc1
Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:37.455294 18936 main.cpp:204] Using 'HierarchicalDRF' allocator
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.016752 18936 leveldb.cpp:176] Opened db in 561.344642ms
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158462 18936 leveldb.cpp:183] Compacted db in 141.288563ms
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158534 18936 leveldb.cpp:198] Created db iterator in 13783ns
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158572 18936 leveldb.cpp:204] Seeked to beginning of db in 
10366ns
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158673 18936 leveldb.cpp:273] Iterated through 3 keys in the db 
in 78606ns
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.158733 18936 replica.cpp:744] Replica recovered with log 
positions 125 -> 126 with 0 holes and 0 unlearned
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@723: Client 
environment:os.name=Linux
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@724: Client 
environment:os.arch=4.0.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@725: Client 
environment:os.version=#2 SMP Fri Jul 10 01:01:50 UTC 2015
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@733: Client 
environment:user.name=(null)
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@741: Client 
environment:user.home=/root
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@753: Client 
environment:user.dir=/
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@zookeeper_init@786: 
Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 
watcher=0x7f0532095480 sessionId=0 sessionPasswd=<null> context=0x7f0504001130 
flags=0
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.160876 18936 main.cpp:383] Starting Mesos master
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f0528cd3700):ZOO_INFO@check_events@1703: 
initiated connection to server [127.0.0.1:2181]
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161655 18936 master.cpp:368] Master 
20151015-131338-3674472458-5050-18936 (10.0.4.219) started on 10.0.4.219:5050
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.161357 18942 log.cpp:238] Attempting to join replica to 
ZooKeeper group
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,161:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
2015-10-15 13:13:38,162:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162201 18936 master.cpp:370] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--cluster="peter-p70wxd2" --framework_sorter="drf" --help="false" 
--hostname="10.0.4.219" --initialize_driver_logging="true" --ip="10.0.4.219" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" 
--quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="5secs" --registry_strict="false" 
--roles="slave_public" --root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/opt/mesosphere/packages/mesos--d43a8eb9946a5c1c5ec05fb21922a2fdf41775b2/share/mesos/webui"
 --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
--zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162433 18936 master.cpp:417] Master allowing unauthenticated 
frameworks to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162454 18936 master.cpp:422] Master allowing unauthenticated 
slaves to register
Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
I1015 13:13:38.162480 18936 master.cpp:459] Using default 'crammd5' 
authenticator


> Master crashes when tearing down framework
> ------------------------------------------
>
>                 Key: MESOS-3744
>                 URL: https://issues.apache.org/jira/browse/MESOS-3744
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>    Affects Versions: 0.23.0
>            Reporter: Peter Kolloch
>         Attachments: master-fail.log
>
>
> The crash happened shortly after calling teardown. The teardown was initiated 
> by using httpie with:
> http -f -v POST "$MASTER_BASE_URL/teardown" "frameworkId=$FRAMEWORK"
> Below you will find the master-fail.log over the relevant time interval. Here 
> are the last log lines before the mesos master died:
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> F1015 13:13:21.511503 23038 sorter.cpp:213] Check failed: 
> total.resources.contains(slaveId)
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> *** Check failure stack trace: ***
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd1860169fd  google::LogMessage::Fail()
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd18601889d  google::LogMessage::SendToLog()
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd1860165ec  google::LogMessage::Flush()
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd1860191be  google::LogMessageFatal::~LogMessageFatal()
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd186af3ea0  mesos::internal::master::allocator::DRFSorter::remove()
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd1869d6dec  
> mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd186fbdab9  process::ProcessManager::resume()
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd186fbddaf  process::schedule()
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd1852bc66c  (unknown)
> Oct 15 13:13:21 ip-10-0-4-219.us-west-2.compute.internal mesos-master[23032]: 
> @     0x7fd184fff2ed  (unknown)
> I am not sure if it matters but in this case multiple framework instances 
> registered with the same framework name.
> Here is an excerpt of the startup of the effected mesos master version 
> because it does contain the software versions in use:
> Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:37.454946 18936 logging.cpp:172] INFO level logging started!
> Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:37.455173 18936 main.cpp:181] Build: 2015-09-28 19:50:01 by
> Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:37.455199 18936 main.cpp:183] Version: 0.23.0
> Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:37.455215 18936 main.cpp:190] Git SHA: 
> 7d15294f46b5062c59818f4d062044ac04349dc1
> Oct 15 13:13:37 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:37.455294 18936 main.cpp:204] Using 'HierarchicalDRF' allocator
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.016752 18936 leveldb.cpp:176] Opened db in 561.344642ms
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.158462 18936 leveldb.cpp:183] Compacted db in 141.288563ms
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.158534 18936 leveldb.cpp:198] Created db iterator in 13783ns
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.158572 18936 leveldb.cpp:204] Seeked to beginning of db in 
> 10366ns
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.158673 18936 leveldb.cpp:273] Iterated through 3 keys in the 
> db in 78606ns
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.158733 18936 replica.cpp:744] Replica recovered with log 
> positions 125 -> 126 with 0 holes and 0 unlearned
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
> environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@723: Client 
> environment:os.name=Linux
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@724: Client 
> environment:os.arch=4.0.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@725: Client 
> environment:os.version=#2 SMP Fri Jul 10 01:01:50 UTC 2015
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@733: Client 
> environment:user.name=(null)
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@741: Client 
> environment:user.home=/root
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@log_env@753: Client 
> environment:user.dir=/
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,159:18936(0x7f052aee3700):ZOO_INFO@zookeeper_init@786: 
> Initiating client connection, host=127.0.0.1:2181 sessionTimeout=10000 
> watcher=0x7f0532095480 sessionId=0 sessionPasswd=<null> 
> context=0x7f0504001130 flags=0
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.160876 18936 main.cpp:383] Starting Mesos master
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f052bee5700):ZOO_INFO@log_env@716: Client 
> environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f0528cd3700):ZOO_INFO@check_events@1703: 
> initiated connection to server [127.0.0.1:2181]
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.161655 18936 master.cpp:368] Master 
> 20151015-131338-3674472458-5050-18936 (10.0.4.219) started on 10.0.4.219:5050
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.161357 18942 log.cpp:238] Attempting to join replica to 
> ZooKeeper group
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,161:18936(0x7f052aee3700):ZOO_INFO@log_env@712: Client 
> environment:zookeeper.version=zookeeper C client 3.4.5
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> 2015-10-15 13:13:38,162:18936(0x7f052aee3700):ZOO_INFO@log_env@716: Client 
> environment:host.name=ip-10-0-4-219.us-west-2.compute.internal
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162201 18936 master.cpp:370] Flags at startup: 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" --cluster="peter-p70wxd2" --framework_sorter="drf" 
> --help="false" --hostname="10.0.4.219" --initialize_driver_logging="true" 
> --ip="10.0.4.219" --log_auto_initialize="true" --log_dir="/var/log/mesos" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --port="5050" --quiet="false" --quorum="1" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --roles="slave_public" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/opt/mesosphere/packages/mesos--d43a8eb9946a5c1c5ec05fb21922a2fdf41775b2/share/mesos/webui"
>  --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
> --zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162433 18936 master.cpp:417] Master allowing unauthenticated 
> frameworks to register
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162454 18936 master.cpp:422] Master allowing unauthenticated 
> slaves to register
> Oct 15 13:13:38 ip-10-0-4-219.us-west-2.compute.internal mesos-master[18936]: 
> I1015 13:13:38.162480 18936 master.cpp:459] Using default 'crammd5' 
> authenticator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to