[ https://issues.apache.org/jira/browse/MESOS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ken Sipe updated MESOS-3719: ---------------------------- Description: invoked a `/master/teardown` for 2 frameworks. sample invocation (on the master node using mesos-dns) is: `curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST http://master.mesos:5050/master/teardown` logs at the master: {code} Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.677880 1525 http.cpp:321] HTTP POST for /master/teardown from 10.0.4.90:53789 with User-Agent='curl/7.42.1' Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679015 1525 master.cpp:5112] Removing framework 20151013-143739-1510211594-5050-1515-0002 (hdfs) at scheduler-a5388720-fcbf-4fd0-b01e-75712b12c99d@10.0.4.90:53903 Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679280 1525 master.cpp:5576] Updating the latest state of task task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679385 1525 master.cpp:5644] Removing task task.journalnode.journalnode.NodeExecutor.1444747955695 with resources cpus(*):0.25; mem(*):691.2 of framework 20151013-143739-1510211594-5050-1515-0002 on Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679404 1521 hierarchical.hpp:814] Recovered cpus(*):0.25; mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679536 1525 master.cpp:5673] Removing executor 'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679641 1521 hierarchical.hpp:814] Recovered cpus(*):0.1; mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: F1013 15:36:24.679719 1521 sorter.cpp:213] Check failed: total.resources.contains(slaveId) Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: *** Check failure stack trace: *** Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86c9fd google::LogMessage::Fail() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86e89d google::LogMessage::SendToLog() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86c5ec google::LogMessage::Flush() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86f1be google::LogMessageFatal::~LogMessageFatal() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e3f2910 mesos::internal::master::allocator::DRFSorter::remove() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e2cc0bc mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e977551 process::ProcessManager::resume() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e97784f process::internal::schedule() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.684733 1520 http.cpp:321] HTTP GET for /master/state-summary from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4' Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d30ebc3 (unknown) Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6cb1266c (unknown) Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6c8552ed (unknown) Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Unit entered failed state. Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Failed with result 'signal'. Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Service hold-off time over, scheduling restart. Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: Starting Mesos Master... {code} was: invoked a `/master/teardown` for 2 frameworks. sample invocation (on the master node using mesos-dns) is: `curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST http://master.mesos:5050/master/teardown` logs at the master: ``` Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.677880 1525 http.cpp:321] HTTP POST for /master/teardown from 10.0.4.90:53789 with User-Agent='curl/7.42.1' Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679015 1525 master.cpp:5112] Removing framework 20151013-143739-1510211594-5050-1515-0002 (hdfs) at scheduler-a5388720-fcbf-4fd0-b01e-75712b12c99d@10.0.4.90:53903 Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679280 1525 master.cpp:5576] Updating the latest state of task task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679385 1525 master.cpp:5644] Removing task task.journalnode.journalnode.NodeExecutor.1444747955695 with resources cpus(*):0.25; mem(*):691.2 of framework 20151013-143739-1510211594-5050-1515-0002 on Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679404 1521 hierarchical.hpp:814] Recovered cpus(*):0.25; mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679536 1525 master.cpp:5673] Removing executor 'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.679641 1521 hierarchical.hpp:814] Recovered cpus(*):0.1; mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: F1013 15:36:24.679719 1521 sorter.cpp:213] Check failed: total.resources.contains(slaveId) Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: *** Check failure stack trace: *** Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86c9fd google::LogMessage::Fail() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86e89d google::LogMessage::SendToLog() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86c5ec google::LogMessage::Flush() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d86f1be google::LogMessageFatal::~LogMessageFatal() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e3f2910 mesos::internal::master::allocator::DRFSorter::remove() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e2cc0bc mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e977551 process::ProcessManager::resume() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6e97784f process::internal::schedule() Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: I1013 15:36:24.684733 1520 http.cpp:321] HTTP GET for /master/state-summary from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4' Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6d30ebc3 (unknown) Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6cb1266c (unknown) Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 0x7fba6c8552ed (unknown) Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Unit entered failed state. Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Failed with result 'signal'. Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: dcos-mesos-master.service: Service hold-off time over, scheduling restart. Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: Starting Mesos Master... ``` > Core dump on /teardown > ---------------------- > > Key: MESOS-3719 > URL: https://issues.apache.org/jira/browse/MESOS-3719 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 0.24.1 > Reporter: Ken Sipe > > invoked a `/master/teardown` for 2 frameworks. sample invocation (on the > master node using mesos-dns) is: > `curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST > http://master.mesos:5050/master/teardown` > logs at the master: > {code} > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > I1013 15:36:24.677880 1525 http.cpp:321] HTTP POST for /master/teardown from > 10.0.4.90:53789 with User-Agent='curl/7.42.1' > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > I1013 15:36:24.679015 1525 master.cpp:5112] Removing framework > 20151013-143739-1510211594-5050-1515-0002 (hdfs) at > scheduler-a5388720-fcbf-4fd0-b01e-75712b12c99d@10.0.4.90:53903 > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > I1013 15:36:24.679280 1525 master.cpp:5576] Updating the latest state of > task task.journalnode.journalnode.NodeExecutor.1444747955695 of framework > 20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > I1013 15:36:24.679385 1525 master.cpp:5644] Removing task > task.journalnode.journalnode.NodeExecutor.1444747955695 with resources > cpus(*):0.25; mem(*):691.2 of framework > 20151013-143739-1510211594-5050-1515-0002 on > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > I1013 15:36:24.679404 1521 hierarchical.hpp:814] Recovered cpus(*):0.25; > mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, > 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > I1013 15:36:24.679536 1525 master.cpp:5673] Removing executor > 'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; > mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > I1013 15:36:24.679641 1521 hierarchical.hpp:814] Recovered cpus(*):0.1; > mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, > 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > F1013 15:36:24.679719 1521 sorter.cpp:213] Check failed: > total.resources.contains(slaveId) > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > *** Check failure stack trace: *** > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6d86c9fd google::LogMessage::Fail() > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6d86e89d google::LogMessage::SendToLog() > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6d86c5ec google::LogMessage::Flush() > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6d86f1be google::LogMessageFatal::~LogMessageFatal() > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6e3f2910 mesos::internal::master::allocator::DRFSorter::remove() > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6e2cc0bc > mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework() > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6e977551 process::ProcessManager::resume() > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6e97784f process::internal::schedule() > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: > I1013 15:36:24.684733 1520 http.cpp:321] HTTP GET for /master/state-summary > from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4' > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6d30ebc3 (unknown) > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6cb1266c (unknown) > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ > 0x7fba6c8552ed (unknown) > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: > dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: > dcos-mesos-master.service: Unit entered failed state. > Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: > dcos-mesos-master.service: Failed with result 'signal'. > Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: > dcos-mesos-master.service: Service hold-off time over, scheduling restart. > Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: Starting > Mesos Master... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)