[jira] [Updated] (MESOS-3719) Core dump on /teardown

Ken Sipe (JIRA) Tue, 13 Oct 2015 08:57:41 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ken Sipe updated MESOS-3719:
----------------------------
    Description: 
invoked a `/master/teardown` for 2 frameworks.  sample invocation (on the 
master node using mesos-dns) is:  
`curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST 
http://master.mesos:5050/master/teardown`


logs at the master:

{code}
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.677880  1525 http.cpp:321] HTTP POST for /master/teardown from 
10.0.4.90:53789 with User-Agent='curl/7.42.1'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679015  1525 master.cpp:5112] Removing framework 
20151013-143739-1510211594-5050-1515-0002 (hdfs) at 
[email protected]:53903
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679280  1525 master.cpp:5576] Updating the latest state of task 
task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 
20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679385  1525 master.cpp:5644] Removing task 
task.journalnode.journalnode.NodeExecutor.1444747955695 with resources 
cpus(*):0.25; mem(*):691.2 of framework 
20151013-143739-1510211594-5050-1515-0002 on
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679404  1521 hierarchical.hpp:814] Recovered cpus(*):0.25; 
mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679536  1525 master.cpp:5673] Removing executor 
'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; 
mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679641  1521 hierarchical.hpp:814] Recovered cpus(*):0.1; 
mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
F1013 15:36:24.679719  1521 sorter.cpp:213] Check failed: 
total.resources.contains(slaveId)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: *** 
Check failure stack trace: ***
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c9fd  google::LogMessage::Fail()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86e89d  google::LogMessage::SendToLog()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c5ec  google::LogMessage::Flush()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86f1be  google::LogMessageFatal::~LogMessageFatal()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e3f2910  mesos::internal::master::allocator::DRFSorter::remove()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e2cc0bc  
mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e977551  process::ProcessManager::resume()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e97784f  process::internal::schedule()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.684733  1520 http.cpp:321] HTTP GET for /master/state-summary 
from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d30ebc3  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6cb1266c  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6c8552ed  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Unit entered failed state.
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Failed with result 'signal'.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Service hold-off time over, scheduling restart.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: Starting 
Mesos Master...
{code}

  was:
invoked a `/master/teardown` for 2 frameworks.  sample invocation (on the 
master node using mesos-dns) is:  
`curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST 
http://master.mesos:5050/master/teardown`


logs at the master:

```
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.677880  1525 http.cpp:321] HTTP POST for /master/teardown from 
10.0.4.90:53789 with User-Agent='curl/7.42.1'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679015  1525 master.cpp:5112] Removing framework 
20151013-143739-1510211594-5050-1515-0002 (hdfs) at 
[email protected]:53903
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679280  1525 master.cpp:5576] Updating the latest state of task 
task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 
20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679385  1525 master.cpp:5644] Removing task 
task.journalnode.journalnode.NodeExecutor.1444747955695 with resources 
cpus(*):0.25; mem(*):691.2 of framework 
20151013-143739-1510211594-5050-1515-0002 on
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679404  1521 hierarchical.hpp:814] Recovered cpus(*):0.25; 
mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679536  1525 master.cpp:5673] Removing executor 
'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; 
mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.679641  1521 hierarchical.hpp:814] Recovered cpus(*):0.1; 
mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
F1013 15:36:24.679719  1521 sorter.cpp:213] Check failed: 
total.resources.contains(slaveId)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: *** 
Check failure stack trace: ***
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c9fd  google::LogMessage::Fail()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86e89d  google::LogMessage::SendToLog()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86c5ec  google::LogMessage::Flush()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d86f1be  google::LogMessageFatal::~LogMessageFatal()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e3f2910  mesos::internal::master::allocator::DRFSorter::remove()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e2cc0bc  
mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e977551  process::ProcessManager::resume()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6e97784f  process::internal::schedule()
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
I1013 15:36:24.684733  1520 http.cpp:321] HTTP GET for /master/state-summary 
from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4'
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6d30ebc3  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6cb1266c  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @   
  0x7fba6c8552ed  (unknown)
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Unit entered failed state.
Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Failed with result 'signal'.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
dcos-mesos-master.service: Service hold-off time over, scheduling restart.
Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: Starting 
Mesos Master...
```


> Core dump on /teardown
> ----------------------
>
>                 Key: MESOS-3719
>                 URL: https://issues.apache.org/jira/browse/MESOS-3719
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.24.1
>            Reporter: Ken Sipe
>
> invoked a `/master/teardown` for 2 frameworks.  sample invocation (on the 
> master node using mesos-dns) is:  
> `curl -d "frameworkId=20151013-143739-1510211594-5050-1515-0002" -X POST 
> http://master.mesos:5050/master/teardown`
> logs at the master:
> {code}
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.677880  1525 http.cpp:321] HTTP POST for /master/teardown from 
> 10.0.4.90:53789 with User-Agent='curl/7.42.1'
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679015  1525 master.cpp:5112] Removing framework 
> 20151013-143739-1510211594-5050-1515-0002 (hdfs) at 
> [email protected]:53903
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679280  1525 master.cpp:5576] Updating the latest state of 
> task task.journalnode.journalnode.NodeExecutor.1444747955695 of framework 
> 20151013-143739-1510211594-5050-1515-0002 to TASK_KILLED
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679385  1525 master.cpp:5644] Removing task 
> task.journalnode.journalnode.NodeExecutor.1444747955695 with resources 
> cpus(*):0.25; mem(*):691.2 of framework 
> 20151013-143739-1510211594-5050-1515-0002 on
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679404  1521 hierarchical.hpp:814] Recovered cpus(*):0.25; 
> mem(*):691.2 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
> 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, a
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679536  1525 master.cpp:5673] Removing executor 
> 'executor.journalnode.NodeExecutor.1444747955695' with resources cpus(*):0.1; 
> mem(*):345.6 of framework 20151013-143739-1510211594-5050-1515-0002 on sl
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.679641  1521 hierarchical.hpp:814] Recovered cpus(*):0.1; 
> mem(*):345.6 (total: ports(*):[1025-2180, 2182-3887, 3889-5049, 5052-8079, 
> 8082-8180, 8182-32000]; cpus(*):4; mem(*):14019; disk(*):32541, al
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> F1013 15:36:24.679719  1521 sorter.cpp:213] Check failed: 
> total.resources.contains(slaveId)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> *** Check failure stack trace: ***
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6d86c9fd  google::LogMessage::Fail()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6d86e89d  google::LogMessage::SendToLog()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6d86c5ec  google::LogMessage::Flush()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6d86f1be  google::LogMessageFatal::~LogMessageFatal()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6e3f2910  mesos::internal::master::allocator::DRFSorter::remove()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6e2cc0bc  
> mesos::internal::master::allocator::HierarchicalAllocatorProcess<>::removeFramework()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6e977551  process::ProcessManager::resume()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6e97784f  process::internal::schedule()
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: 
> I1013 15:36:24.684733  1520 http.cpp:321] HTTP GET for /master/state-summary 
> from 10.0.4.90:53790 with User-Agent='Python-urllib/3.4'
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6d30ebc3  (unknown)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6cb1266c  (unknown)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal mesos-master[1515]: @ 
>     0x7fba6c8552ed  (unknown)
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
> dcos-mesos-master.service: Main process exited, code=killed, status=6/ABRT
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
> dcos-mesos-master.service: Unit entered failed state.
> Oct 13 15:36:24 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
> dcos-mesos-master.service: Failed with result 'signal'.
> Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: 
> dcos-mesos-master.service: Service hold-off time over, scheduling restart.
> Oct 13 15:36:40 ip-10-0-4-90.us-west-2.compute.internal systemd[1]: Starting 
> Mesos Master...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3719) Core dump on /teardown

Reply via email to