Correct some configuration.

agent_ping_timeout * max_agent_ping_timeouts = 75secs

Thx,
Jiamin


From: "Zhu, Jiamin" <jiam...@paypal.com>
Date: Saturday, July 15, 2017 at 9:29 PM
To: dev mesos <dev@mesos.apache.org>
Cc: Meghdoot Bhattacharya <mbhattacha...@paypal.com>
Subject: question

Hi all,
Hope you all doing well!
We got an issue about mesos executor. We shutdown mesos slave for couple of 
mins(more than agent_reregister_timeout), mesos slave sent shutdown message to 
executor when we brought mesos slave back, but we didn’t see any expected log 
print out from executor side. And executor process was killed really quick 
after mesos slave became online.

I would greatly appreciate if you can kindly point us what’s going on. Thx!

Here are relevant configuration and log
agent_reregister_timeout=75secs
executor_shutdown_grace_period=30secs
checkingpoint is enabled.


Log from mesos-slave
I0713 11:24:14.727005 23237 slave.cpp:3421] Re-registering executor 
'compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0' of 
framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
I0713 11:24:14.727247 23237 slave.cpp:3634] Handling status update TASK_RUNNING 
(UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task 
paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 from executor(1)@10.180.72.144:45285
I0713 11:24:14.728474 23237 memory.cpp:199] Updated 
'memory.soft_limit_in_bytes' to 4352MB for container 
c5b88043-6e46-4bde-bb8d-96dbaafe0e99
I0713 11:24:14.728746 23238 status_update_manager.cpp:323] Received status 
update TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task 
paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
I0713 11:24:14.728801 23238 status_update_manager.cpp:832] Checkpointing UPDATE 
for status update TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for 
task paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of 
framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
W0713 11:24:14.728997 23238 slave.cpp:3997] Dropping status update TASK_RUNNING 
(UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task 
paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 sent by status update manager because 
the agent is in RECOVERING state
I0713 11:24:14.729033 23238 slave.cpp:3961] Sending acknowledgement for status 
update TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task 
paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 to executor(1)@10.180.72.144:45285
I0713 11:24:14.729878 23237 cpu.cpp:101] Updated 'cpu.shares' to 1280 (cpus 
1.25) for container c5b88043-6e46-4bde-bb8d-96dbaafe0e99
I0713 11:24:16.726043 23240 slave.cpp:3574] Cleaning up un-reregistered 
executors
I0713 11:24:16.726153 23240 slave.cpp:5281] Finished recovery
I0713 11:24:16.726739 23240 slave.cpp:915] New master detected at 
master@10.180.72.104:5050
I0713 11:24:16.726766 23240 slave.cpp:974] Authenticating with master 
master@10.180.72.104:5050
I0713 11:24:16.726883 23240 slave.cpp:985] Using default CRAM-MD5 authenticatee
I0713 11:24:16.726934 23240 slave.cpp:947] Detecting new master
I0713 11:24:16.726963 23240 status_update_manager.cpp:177] Pausing sending 
status updates
I0713 11:24:16.727108 23240 authenticatee.cpp:97] Initializing client SASL
I0713 11:24:16.728171 23240 authenticatee.cpp:121] Creating new client SASL 
connection
I0713 11:24:16.730295 23241 authenticatee.cpp:213] Received SASL authentication 
mechanisms: CRAM-MD5
I0713 11:24:16.730348 23241 authenticatee.cpp:239] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0713 11:24:16.731009 23242 authenticatee.cpp:259] Received SASL authentication 
step
I0713 11:24:16.731638 23244 authenticatee.cpp:299] Authentication success
I0713 11:24:16.731791 23243 slave.cpp:1069] Successfully authenticated with 
master master@10.180.72.104:5050
I0713 11:24:16.746140 23242 slave.cpp:1217] Re-registered with master 
master@10.180.72.104:5050
I0713 11:24:16.746206 23242 slave.cpp:1253] Forwarding total oversubscribed 
resources {}
I0713 11:24:16.746295 23242 slave.cpp:2536] Shutting down framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
I0713 11:24:16.746320 23242 slave.cpp:4860] Shutting down executor 
'compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0' of 
framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 at 
executor(1)@10.180.72.144:45285
W0713 11:24:16.746402 23242 slave.cpp:2676] Ignoring updating pid for framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 because it is terminating
I0713 11:24:16.746435 23242 status_update_manager.cpp:184] Resuming sending 
status updates
W0713 11:24:16.746443 23242 status_update_manager.cpp:191] Resending status 
update TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task 
paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
I0713 11:24:16.746513 23242 slave.cpp:4051] Forwarding the update TASK_RUNNING 
(UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task 
paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 to master@10.180.72.104:5050
I0713 11:24:16.747684 23244 slave.cpp:4179] Got exited event for 
executor(1)@10.180.72.144:45285
I0713 11:24:16.749622 23240 status_update_manager.cpp:395] Received status 
update acknowledgement (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for task 
paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of framework 
3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
I0713 11:24:16.749706 23240 status_update_manager.cpp:832] Checkpointing ACK 
for status update TASK_RUNNING (UUID: ab4daeff-67f7-11e7-947f-42010ab44890) for 
task paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0 of 
framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000
I0713 11:24:16.831954 23241 containerizer.cpp:2313] Container 
c5b88043-6e46-4bde-bb8d-96dbaafe0e99 has exited
I0713 11:24:16.832000 23241 containerizer.cpp:1950] Destroying container 
c5b88043-6e46-4bde-bb8d-96dbaafe0e99 in RUNNING state
I0713 11:24:16.832098 23241 linux_launcher.cpp:498] Asked to destroy container 
c5b88043-6e46-4bde-bb8d-96dbaafe0e99
I0713 11:24:16.832731 23241 linux_launcher.cpp:541] Using freezer to destroy 
cgroup mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99
I0713 11:24:16.833940 23241 cgroups.cpp:2705] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/5bb4c6ae725bea06daf6faeca2dfec7d058e504505c2bb6b54aba831ce5c3d27
I0713 11:24:16.833946 23244 cgroups.cpp:2705] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/6c0550ac53dc5001953bea3591b1192833591aff59b1cf57e755b926d48cca12
I0713 11:24:16.834002 23243 cgroups.cpp:2705] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c
I0713 11:24:16.834003 23242 cgroups.cpp:2705] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99
I0713 11:24:16.837601 23244 cgroups.cpp:1439] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c
 after 1.716992ms
I0713 11:24:16.837877 23242 cgroups.cpp:1439] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99 after 
1.916928ms
I0713 11:24:16.839395 23244 cgroups.cpp:2723] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c
I0713 11:24:16.839664 23242 cgroups.cpp:2723] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99
I0713 11:24:16.841399 23244 cgroups.cpp:1468] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c
 after 1.956864ms
I0713 11:24:16.843209 23242 cgroups.cpp:1468] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99 after 
3.491072ms
I0713 11:24:16.937427 23240 cgroups.cpp:1439] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/6c0550ac53dc5001953bea3591b1192833591aff59b1cf57e755b926d48cca12
 after 103.37408ms
I0713 11:24:16.937711 23242 cgroups.cpp:1439] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/5bb4c6ae725bea06daf6faeca2dfec7d058e504505c2bb6b54aba831ce5c3d27
 after 103.67488ms
I0713 11:24:16.939801 23240 cgroups.cpp:2723] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/6c0550ac53dc5001953bea3591b1192833591aff59b1cf57e755b926d48cca12
I0713 11:24:16.940266 23242 cgroups.cpp:2723] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/5bb4c6ae725bea06daf6faeca2dfec7d058e504505c2bb6b54aba831ce5c3d27
I0713 11:24:16.942387 23240 cgroups.cpp:1468] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/6c0550ac53dc5001953bea3591b1192833591aff59b1cf57e755b926d48cca12
 after 2.450176ms
I0713 11:24:16.943276 23242 cgroups.cpp:1468] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos/c5b88043-6e46-4bde-bb8d-96dbaafe0e99/5bb4c6ae725bea06daf6faeca2dfec7d058e504505c2bb6b54aba831ce5c3d27
 after 2.972928ms
E0713 11:24:16.986686 23244 slave.cpp:4520] Termination of executor 
'compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0' of 
framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 failed: Failed to kill all 
processes in the container: Failed to kill tasks in nested cgroups: Collect 
failed: Failed to kill all processes in cgroup: Failed to read cgroups control 
'cgroup.procs': 'mesos/c5b88043-6e46-4bde-bb8d-9
6dbaafe0e99/c0a8fa6f2fdaeb2963781ec39b738b665459911f07245c7254dddddbc568251c' 
is not a valid cgroup
I0713 11:24:16.986958 23244 slave.cpp:4646] Cleaning up executor 
'compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0' of 
framework 3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000 at 
executor(1)@10.180.72.144:45285
I0713 11:24:16.987180 23242 gc.cpp:55] Scheduling 
'/var/lib/mesos/slaves/3ae40afd-a4c7-4f34-8668-2f513762b2a6-S87/frameworks/3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000/executors/compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0/runs/c5b88043-6e46-4bde-bb8d-96dbaafe0e99'
 for gc 6.99998857482667days in the future
I0713 11:24:16.987231 23242 gc.cpp:55] Scheduling 
'/var/lib/mesos/slaves/3ae40afd-a4c7-4f34-8668-2f513762b2a6-S87/frameworks/3ae40afd-a4c7-4f34-8668-2f513762b2a6-0000/executors/compose-paypal-iptest-raptorapp-a-0-c1e7663a-cc2e-4856-9b31-534dd3da22d0'
 for gc 6.99998857391111days in the future

Reply via email to