Usually executor would terminate itself if it reap the task status is killed or finished. Otherwise the reap callback have not yet registered not our executor has bug when reap task status. Could you find something in the executor stdout/stderr ?
On Sat, Jun 4, 2016 at 6:08 PM, Tomek Janiszewski <[email protected]> wrote: > Thanks. I just manually find that executor pid and killed it. Any idea why > it was still running without tasks? > > sob., 4.06.2016, 05:35 użytkownik haosdent <[email protected]> napisał: > > > > 13:33:39.031054 [slave.cpp:2643] Got registration for executor > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 from executor(1)@ > > 10.55.97.170:60083 > > > > Yes, according to your log, your executor is still running. If your > > executor is http_command_executor, > > you could use > > > > > https://github.com/apache/mesos/blob/master/docs/executor-http-api.md#shutdown > > to shutdown it. > > If it is other type executor, seems don't have a api to shutdown executor > > as I know. Not sure whether kill the executor in > > Agent could resolve your problem or not. > > > > On Fri, Jun 3, 2016 at 4:33 PM, Tomek Janiszewski <[email protected]> > > wrote: > > > > > Here is truncated response from slave(1)/state > > > > > > { > > > "attributes": {...}, > > > "completed_frameworks": [], > > > "flags": {...}, > > > "frameworks": [ > > > { > > > "checkpoint": true, > > > "completed_executors": [...], > > > "executors": [ > > > { > > > "queued_tasks": [], > > > "tasks": [], > > > "completed_tasks": [ > > > { > > > "discovery": {...}, > > > "executor_id": "", > > > "framework_id": > > > "f65b163c-0faf-441f-ac14-91739fa4394c-0000", > > > "id": > > > "service.a3b609b8-27ec-11e6-8044-02c89eb9127e", > > > "labels": [...], > > > "name": "service", > > > "resources": {...}, > > > "slave_id": > > > "ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13", > > > "state": "TASK_KILLED", > > > "statuses": [] > > > } > > > ], > > > "container": "ead42e63-ac92-4ad0-a99c-4af9c3fa5e31", > > > "directory": "...", > > > "id": "service.a3b609b8-27ec-11e6-8044-02c89eb9127e", > > > "name": "Command Executor (Task: > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e) (Command: sh -c 'cd > > > service...')", > > > "resources": {...}, > > > "source": > > "service.a3b609b8-27ec-11e6-8044-02c89eb9127e" > > > > > > }, > > > ... > > > ], > > > } > > > ], > > > "git_sha": "961edbd82e691a619a4c171a7aadc9c32957fa73", > > > "git_tag": "0.28.0", > > > "version": "0.28.0", > > > ... > > > } > > > > > > Here is the log for this container: > > > > > > > 13:33:19.479182 [slave.cpp:1361] Got assigned task > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e for framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > 13:33:19.482566 [slave.cpp:1480] Launching task > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e for framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > 13:33:19.483921 [paths.cpp:528] Trying to chown > > > > > > > > > '/tmp/mesos/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-0000/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31' > > > to user 'mesosuser' > > > > 13:33:19.504173 [slave.cpp:5367] Launching executor > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 with resources cpus(*):0.1; > > > mem(*):32 in work directory > > > > > > > > > '/tmp/mesos/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-0000/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31' > > > > 13:33:19.505537 [containerizer.cpp:666] Starting container > > > 'ead42e63-ac92-4ad0-a99c-4af9c3fa5e31' for executor > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework > > > 'f65b163c-0faf-441f-ac14-91739fa4394c-0000' > > > > 13:33:19.505734 [slave.cpp:1698] Queuing task > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' for executor > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > ... > > > > 13:33:19.977483 [containerizer.cpp:1118] Checkpointing executor's > > forked > > > pid 25576 to > > > > > > > > > '/tmp/mesos/meta/slaves/ef232fd9-5114-4d8f-adc3-1669c1e6fdc5-S13/frameworks/f65b163c-0faf-441f-ac14-91739fa4394c-0000/executors/service.a3b609b8-27ec-11e6-8044-02c89eb9127e/runs/ead42e63-ac92-4ad0-a99c-4af9c3fa5e31/pids/forked.pid' > > > > 13:33:35.775195 [slave.cpp:1891] Asked to kill task > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > 13:33:35.775645 [slave.cpp:3002] Handling status update TASK_KILLED > > > (UUID: eba64915-7df2-483d-8982-a9a46a48a81b) for task > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 f > > > rom @0.0.0.0:0 > > > > 13:33:35.778105 [cpushare.cpp:389] Updated 'cpu.shares' to 102 (cpus > > > 0.1) for container ead42e63-ac92-4ad0-a99c-4af9c3fa5e31 > > > > 13:33:35.778488 [disk.cpp:169] Updating the disk resources for > > container > > > ead42e63-ac92-4ad0-a99c-4af9c3fa5e31 to cpus(*):0.1 > > > ; mem(*):32 > > > > 13:33:35.780349 [mem.cpp:353] Updated 'memory.soft_limit_in_bytes' > to > > > 32MB for container ead42e63-ac92-4ad0-a99c-4af9c3fa5e3 > > > 1 > > > > 13:33:35.782573 [status_update_manager.cpp:320] Received status > update > > > TASK_KILLED (UUID: eba64915-7df2-483d-8982-a9a46a48a8 > > > 1b) for task service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > f65b163c-0faf-441f-ac14-9173 > > > 9fa4394c-0000 > > > > 13:33:35.783860 [status_update_manager.cpp:824] Checkpointing UPDATE > > for > > > status update TASK_KILLED (UUID: eba64915-7df2-483d-8982-a9a46a48a81b) > > for > > > task service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > 13:33:35.788767 [slave.cpp:3400] Forwarding the update TASK_KILLED > > > (UUID: eba64915-7df2-483d-8982-a9a46a48a81b) for task > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 to [email protected]:5050 > > > > 13:33:35.917932 [status_update_manager.cpp:392] Received status > update > > > acknowledgement (UUID: eba64915-7df2-483d-8982-a9a46a48a81b) for task > > > service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > > 13:33:35.918143 [status_update_manager.cpp:824] Checkpointing ACK > for > > > status update TASK_KILLED (UUID: eba64915-7df2-483d-8982-a9a46a48a81b) > > for > > > task service.a3b609b8-27ec-11e6-8044-02c89eb9127e of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 > > > ... > > > > 13:33:39.031054 [slave.cpp:2643] Got registration for executor > > > 'service.a3b609b8-27ec-11e6-8044-02c89eb9127e' of framework > > > f65b163c-0faf-441f-ac14-91739fa4394c-0000 from executor(1)@ > > > 10.55.97.170:60083 > > > > > > > > > Visible container is no longer running but it appears as running. What > > > should I do with it? > > > > > > Thanks > > > Tomek > > > > > > > > > czw., 2.06.2016 o 15:55 użytkownik Tomek Janiszewski < > [email protected]> > > > napisał: > > > > > > > Yes. I see dead executor in executors. It's tasks and queued_tasks > are > > > > empty but there is one task in completed_tasks. > > > frameworks.completed_executors > > > > are filled with other executors. > > > > > > > > czw., 2.06.2016 o 15:39 użytkownik haosdent <[email protected]> > > > napisał: > > > > > > > >> Hi, @janiszt Seems the completed executors only exists > > > >> in completed_frameworks.completed_executors > > > >> or frameworks.completed_executors in my side. > > > >> > > > >> In your side, does completed_executors exists in any other fields? > > > >> > > > >> On Thu, Jun 2, 2016 at 5:39 PM, Tomek Janiszewski < > [email protected]> > > > >> wrote: > > > >> > > > >> > Hi > > > >> > > > > >> > I'm running Mesos 0.28.0. Mesos slave(1)/state endpoint returns > some > > > >> > completed executors not in frameworks.completed_executors but in > > > >> > frameworks. > > > >> > executors. > > > >> > Is it normal behavior? How to force Mesos to move completed > > > >> > executors into frameworks.executors? > > > >> > > > > >> > Thanks > > > >> > Tomek > > > >> > > > > >> > > > >> > > > >> > > > >> -- > > > >> Best Regards, > > > >> Haosdent Huang > > > >> > > > > > > > > > > > > > > > -- > > Best Regards, > > Haosdent Huang > > > -- Best Regards, Haosdent Huang
