I printed more logs in mesos-go. The log showed mesos-go received shutdown 
message and invoked callback function Shutdown. The message after invoking 
Shutdown also printed out. But still nothing print out from executor.

this is the log from mesos-go
2017/07/18 20:49:19 Executor driver received shutdown
2017/07/18 20:49:19 Executor driver is asked to shutdown
------------------(Shutdown is invoked here)--------------------
2017/07/18 20:49:19 Stop driver

And I also tested it using example executor provided by mesos-go, and got same 
result.
Then I launched multiple tasks in mesos and tested the same scenario, One of 
the executors printed expected shutdown message when I have multiple executors 
running. So I’m guessing executor exited before print out shutdown message. 
And the trigger of executor exiting may happen parallel with shutdown. That’s 
why executor could print shutdown message sometime if it exited after receiving 
shutdown message.
What could be the trigger of executor exiting then? 

Many thanks,
Jiamin



> On Jul 17, 2017, at 4:56 PM, Vinod Kone <vinodk...@apache.org> wrote:
> 
> From the snippet below, it looks like the executor exited by itself right
> after the agent sent it a shutdown message. The cleanup you see happens
> immediately *after* the containerizer sees that the executor container has
> exited; so grace period doesn't come into play.
> 
> I0713 11:24:16.831954 23241 containerizer.cpp:2313] Container
>> c5b88043-6e46-4bde-bb8d-96dbaafe0e99 has exited
>> I0713 11:24:16.832000 23241 containerizer.cpp:1950] Destroying container
>> c5b88043-6e46-4bde-bb8d-96dbaafe0e99 in RUNNING state
>> I0713 11:24:16.832098 23241 linux_launcher.cpp:498] Asked to destroy
>> container c5b88043-6e46-4bde-bb8d-96dbaafe0e99
>> 
>> 
> Are you sure your executor didn't get the shutdown message? Is logging
> enabled on the executor? It looks like it got the shutdown message and
> exited.
> 
> 
> 
> On Sun, Jul 16, 2017 at 11:41 PM, Zhu, Jiamin <jiam...@paypal.com.invalid>
> wrote:
> 
>> Correct some configuration.
>> 
>> agent_ping_timeout * max_agent_ping_timeouts = 75secs
>> 
>> Thx,
>> Jiamin
>> 
>> 
>> From: "Zhu, Jiamin" <jiam...@paypal.com>
>> Date: Saturday, July 15, 2017 at 9:29 PM
>> To: dev mesos <dev@mesos.apache.org>
>> Cc: Meghdoot Bhattacharya <mbhattacha...@paypal.com>
>> Subject: question
>> 
>> Hi all,
>> Hope you all doing well!
>> We got an issue about mesos executor. We shutdown mesos slave for couple
>> of mins(more than agent_reregister_timeout), mesos slave sent shutdown
>> message to executor when we brought mesos slave back, but we didn’t see any
>> expected log print out from executor side. And executor process was killed
>> really quick after mesos slave became online.
>> 
>> I would greatly appreciate if you can kindly point us what’s going on. Thx!
>> 
>> Here are relevant configuration and log
>> agent_reregister_timeout=75secs
>> executor_shutdown_grace_period=30secs
>> checkingpoint is enabled.
>> 
>> 
>> Log from mesos-slave
>> I0713 11:24:14.727005 23237 slave.cpp:3421] Re-registering executor
>> 'compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0'
>> of framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
>> I0713 11:24:14.727247 23237 slave.cpp:3634] Handling status update
>> TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task
>> paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of
>> framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 from executor(1)@
>> 10.180.72.144:45285
>> I0713 11:24:14.728474 23237 memory.cpp:199] Updated
>> 'memory.soft_limit_in_bytes' to 4352MB for container
>> c5b88043-6e46-4bde-bb8d-96dbaafe0e99
>> I0713 11:24:14.728746 23238 status_update_manager.cpp:323] Received status
>> update TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task
>> paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of
>> framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
>> I0713 11:24:14.728801 23238 status_update_manager.cpp:832] Checkpointing
>> UPDATE for status update TASK_RUNNING (UUID: 
>> ab4daeff-67f7-11e7-947f-42010ab44890)
>> for task paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0
>> of framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
>> W0713 11:24:14.728997 23238 slave.cpp:3997] Dropping status update
>> TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task
>> paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of
>> framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 sent by status update
>> manager because the agent is in RECOVERING state
>> I0713 11:24:14.729033 23238 slave.cpp:3961] Sending acknowledgement for
>> status update TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890)
>> for task paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0
>> of framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 to executor(1)@
>> 10.180.72.144:45285
>> I0713 11:24:14.729878 23237 cpu.cpp:101] Updated 'cpu.shares' to 1280
>> (cpus 1.25) for container c5b88043-6e46-4bde-bb8d-96dbaafe0e99
>> I0713 11:24:16.726043 23240 slave.cpp:3574] Cleaning up un-reregistered
>> executors
>> I0713 11:24:16.726153 23240 slave.cpp:5281] Finished recovery
>> I0713 11:24:16.726739 23240 slave.cpp:915] New master detected at
>> master@10.180.72.104:5050
>> I0713 11:24:16.726766 23240 slave.cpp:974] Authenticating with master
>> master@10.180.72.104:5050
>> I0713 11:24:16.726883 23240 slave.cpp:985] Using default CRAM-MD5
>> authenticatee
>> I0713 11:24:16.726934 23240 slave.cpp:947] Detecting new master
>> I0713 11:24:16.726963 23240 status_update_manager.cpp:177] Pausing sending
>> status updates
>> I0713 11:24:16.727108 23240 authenticatee.cpp:97] Initializing client SASL
>> I0713 11:24:16.728171 23240 authenticatee.cpp:121] Creating new client
>> SASL connection
>> I0713 11:24:16.730295 23241 authenticatee.cpp:213] Received SASL
>> authentication mechanisms: CRAM-MD5
>> I0713 11:24:16.730348 23241 authenticatee.cpp:239] Attempting to
>> authenticate with mechanism 'CRAM-MD5'
>> I0713 11:24:16.731009 23242 authenticatee.cpp:259] Received SASL
>> authentication step
>> I0713 11:24:16.731638 23244 authenticatee.cpp:299] Authentication success
>> I0713 11:24:16.731791 23243 slave.cpp:1069] Successfully authenticated
>> with master master@10.180.72.104:5050
>> I0713 11:24:16.746140 23242 slave.cpp:1217] Re-registered with master
>> master@10.180.72.104:5050
>> I0713 11:24:16.746206 23242 slave.cpp:1253] Forwarding total
>> oversubscribed resources {}
>> I0713 11:24:16.746295 23242 slave.cpp:2536] Shutting down framework
>> 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
>> I0713 11:24:16.746320 23242 slave.cpp:4860] Shutting down executor
>> 'compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0'
>> of framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 at executor(1)@
>> 10.180.72.144:45285
>> W0713 11:24:16.746402 23242 slave.cpp:2676] Ignoring updating pid for
>> framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 because it is
>> terminating
>> I0713 11:24:16.746435 23242 status_update_manager.cpp:184] Resuming
>> sending status updates
>> W0713 11:24:16.746443 23242 status_update_manager.cpp:191] Resending
>> status update TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890)
>> for task paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0
>> of framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
>> I0713 11:24:16.746513 23242 slave.cpp:4051] Forwarding the update
>> TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task
>> paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of
>> framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 to
>> master@10.180.72.104:5050
>> I0713 11:24:16.747684 23244 slave.cpp:4179] Got exited event for
>> executor(1)@10.180.72.144:45285
>> I0713 11:24:16.749622 23240 status_update_manager.cpp:395] Received status
>> update acknowledgement (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for
>> task paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of
>> framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
>> I0713 11:24:16.749706 23240 status_update_manager.cpp:832] Checkpointing
>> ACK for status update TASK_RUNNING (UUID: 
>> ab4daeff-67f7-11e7-947f-42010ab44890)
>> for task paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0
>> of framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
>> I0713 11:24:16.831954 23241 containerizer.cpp:2313] Container
>> c5b88043-6e46-4bde-bb8d-96dbaafe0e99 has exited
>> I0713 11:24:16.832000 23241 containerizer.cpp:1950] Destroying container
>> c5b88043-6e46-4bde-bb8d-96dbaafe0e99 in RUNNING state
>> I0713 11:24:16.832098 23241 linux_launcher.cpp:498] Asked to destroy
>> container c5b88043-6e46-4bde-bb8d-96dbaafe0e99
>> I0713 11:24:16.832731 23241 linux_launcher.cpp:541] Using freezer to
>> destroy cgroup mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99
>> I0713 11:24:16.833940 23241 cgroups.cpp:2705] Freezing cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/5bb4c6ae725bea06daf6faeca2dfec7d058e504505c2bb6b54aba831ce5c3d27
>> I0713 11:24:16.833946 23244 cgroups.cpp:2705] Freezing cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/6c0550ac53dc5001953bea3591b1192833591aff59b1cf57e755b926d48cca12
>> I0713 11:24:16.834002 23243 cgroups.cpp:2705] Freezing cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c
>> I0713 11:24:16.834003 23242 cgroups.cpp:2705] Freezing cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99
>> I0713 11:24:16.837601 23244 cgroups.cpp:1439] Successfully froze cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c
>> after 1.716992ms
>> I0713 11:24:16.837877 23242 cgroups.cpp:1439] Successfully froze cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99 after
>> 1.916928ms
>> I0713 11:24:16.839395 23244 cgroups.cpp:2723] Thawing cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c
>> I0713 11:24:16.839664 23242 cgroups.cpp:2723] Thawing cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99
>> I0713 11:24:16.841399 23244 cgroups.cpp:1468] Successfully thawed cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c
>> after 1.956864ms
>> I0713 11:24:16.843209 23242 cgroups.cpp:1468] Successfully thawed cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99 after
>> 3.491072ms
>> I0713 11:24:16.937427 23240 cgroups.cpp:1439] Successfully froze cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/6c0550ac53dc5001953bea3591b1192833591aff59b1cf57e755b926d48cca12
>> after 103.37408ms
>> I0713 11:24:16.937711 23242 cgroups.cpp:1439] Successfully froze cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/5bb4c6ae725bea06daf6faeca2dfec7d058e504505c2bb6b54aba831ce5c3d27
>> after 103.67488ms
>> I0713 11:24:16.939801 23240 cgroups.cpp:2723] Thawing cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/6c0550ac53dc5001953bea3591b1192833591aff59b1cf57e755b926d48cca12
>> I0713 11:24:16.940266 23242 cgroups.cpp:2723] Thawing cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/5bb4c6ae725bea06daf6faeca2dfec7d058e504505c2bb6b54aba831ce5c3d27
>> I0713 11:24:16.942387 23240 cgroups.cpp:1468] Successfully thawed cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/6c0550ac53dc5001953bea3591b1192833591aff59b1cf57e755b926d48cca12
>> after 2.450176ms
>> I0713 11:24:16.943276 23242 cgroups.cpp:1468] Successfully thawed cgroup
>> /sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaaf
>> e0e99/5bb4c6ae725bea06daf6faeca2dfec7d058e504505c2bb6b54aba831ce5c3d27
>> after 2.972928ms
>> E0713 11:24:16.986686 23244 slave.cpp:4520] Termination of executor
>> 'compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0'
>> of framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 failed: Failed to
>> kill all processes in the container: Failed to kill tasks in nested
>> cgroups: Collect failed: Failed to kill all processes in cgroup: Failed to
>> read cgroups control 'cgroup.procs': 'mesos/c5b88043-6e46-4bde-bb8d-9
>> 6dbaafe0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c'
>> is not a valid cgroup
>> I0713 11:24:16.986958 23244 slave.cpp:4646] Cleaning up executor
>> 'compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0'
>> of framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 at executor(1)@
>> 10.180.72.144:45285
>> I0713 11:24:16.987180 23242 gc.cpp:55] Scheduling
>> '/var/lib/mesos/slaves/3ae40afd-a4c7-4f34-8668-2f513762b2a6-
>> S87/frameworks/3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000/exe
>> cutors/compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-
>> 4856-9b31-534dd3da22d0/runs/c5b88043-6e46-4bde-bb8d-96dbaafe0e99' for gc
>> 6.99998857482667days in the future
>> I0713 11:24:16.987231 23242 gc.cpp:55] Scheduling
>> '/var/lib/mesos/slaves/3ae40afd-a4c7-4f34-8668-2f513762b2a6-
>> S87/frameworks/3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000/exe
>> cutors/compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0'
>> for gc 6.99998857391111days in the future
>> 
>> 

Reply via email to