[
https://issues.apache.org/jira/browse/MESOS-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085536#comment-15085536
]
Lei Xu commented on MESOS-4297:
-------------------------------
Here is some master logs when I kill task.
{code}
./mesos-master.WARNING:W0106 19:47:12.636579 1548 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task
driver-20151230225518-0013 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:47:52.453431 1547 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task
driver-20151230225518-0013 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:49:12.115389 1550 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task
driver-20151230225518-0013 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:51:52.144099 1543 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task
driver-20151230225518-0013 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:52:39.169888 1549 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task
driver-20151230223633-0011 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051
(l-qosslave25.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:57:12.453138 1549 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task
driver-20151230225518-0013 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 20:02:39.168820 1545 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task
driver-20151230223633-0011 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051
(l-qosslave25.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 20:07:12.110839 1548 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task
driver-20151230225518-0013 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 20:12:39.215056 1543 master.cpp:4408] Ignoring
status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task
driver-20151230223633-0011 of framework
20151228-163100-504125962-5050-31081-0003 from slave
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051
(l-qosslave25.ops.cn2.qunar.com) because the framework is unknown
{code}
> Executor does not shutdown when framework teardown.
> ---------------------------------------------------
>
> Key: MESOS-4297
> URL: https://issues.apache.org/jira/browse/MESOS-4297
> Project: Mesos
> Issue Type: Bug
> Components: framework
> Affects Versions: 0.25.0
> Environment: Marathon 0.11.0
> Mesos 0.25.0
> Spark 1.5.2
> Reporter: Lei Xu
> Priority: Critical
>
> We found a problem when teardown a Spark framework on Mesos, the executor
> could not exit and still running.
> {code}
> root 48548 48539 2 2015 ? 04:28:11 /home/q/java/default/bin/java
> -cp
> /home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar
> -Xms8192m -Xmx8192m org.apache.spark.executor.CoarseGrainedExecutorBackend
> --driver-url
> akka.tcp://[email protected]:47938/user/CoarseGrainedScheduler
> --executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/3 --hostname
> l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id
> 20151228-163100-504125962-5050-31081-0016
> root 48644 48348 0 2015 ? 00:00:00 sh -c cd spark-1*;
> ./bin/spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend
> --driver-url
> akka.tcp://[email protected]:47938/user/CoarseGrainedScheduler
> --executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/5 --hostname
> l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id
> 20151228-163100-504125962-5050-31081-0016
> root 48645 48644 2 2015 ? 04:28:45 /home/q/java/default/bin/java
> -cp
> /home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/5/runs/851073c4-d225-426b-b1b5-3d294eb76f8e/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/5/runs/851073c4-d225-426b-b1b5-3d294eb76f8e/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar
> -Xms8192m -Xmx8192m org.apache.spark.executor.CoarseGrainedExecutorBackend
> --driver-url
> akka.tcp://[email protected]:47938/user/CoarseGrainedScheduler
> --executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/5 --hostname
> l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id
> 20151228-163100-504125962-5050-31081-0016
> {code}
> This framework {{20151228-163100-504125962-5050-31081-0016}} has already
> teardown a few days ago, And could not find in "Frameworks" page via webui.
> But in the slave page, I found it still registered with slave node and run
> some executors.
> And I try to use REST API to kill the framework again, it returns {{No
> framework found with specified ID}}.
> At last I killed the Spark task and mesos executor, there is no new task
> started by framework, but it still on this slave and does not exit.
> {code}
> Frameworks
> ID User Name Active Tasks CPUs (Used / Allocated) Mem
> (Used / Allocated)
> …5050-31081-0016
> root wireless-m_invocation_kylin 0 / 0.6 / 192 MB
> Executors
> ID Name Source Active Tasks Queued Tasks CPUs (Used / Allocated)
> Mem (Used / Allocated)
> 5 Command Executor (Task: 5) (Command: sh -c 'cd spark-1*;...') 5
> 0 0 / 0.1 / 32 MB Sandbox
> 4 Command Executor (Task: 4) (Command: sh -c 'cd spark-1*;...') 4
> 0 0 / 0.1 / 32 MB Sandbox
> 3 Command Executor (Task: 3) (Command: sh -c 'cd spark-1*;...') 3
> 0 0 / 0.1 / 32 MB Sandbox
> 2 Command Executor (Task: 2) (Command: sh -c 'cd spark-1*;...') 2
> 0 0 / 0.1 / 32 MB Sandbox
> 1 Command Executor (Task: 1) (Command: sh -c 'cd spark-1*;...') 1
> 0 0 / 0.1 / 32 MB Sandbox
> 0 Command Executor (Task: 0) (Command: sh -c 'cd spark-1*;...') 0
> 0 0 / 0.1 / 32 MB Sandbox
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)