Hey Guodong, So, looks like Task_Tracker_242 did not register with the slave within 1 minute and the slave decided to kill it because it was deemed unhealthy. At this point the executor should've received a kill signal from the slave. Do you see anything of that sort in the slave or executor logs?
On Mon, Jul 8, 2013 at 11:30 PM, 王国栋 <wangg...@gmail.com> wrote: > Hi vinod. > > I am using the code from the trunk. I think the latest commit is at Jul > 1st. I will grep some master log in another mail. > > The Task "Task_Tracker_242" is stuck in STAGING. I think "Task_Tracker_224" > and "Task_Tracker_230" exit sucessfully. But it is strange that there are a > lot of "Fail to collect resource..." warnings. > > I0709 00:46:11.288698 11002 slave.cpp:739] Got assigned task > Task_Tracker_242 for framework 201307040929-252063498-5050-27411-0000 > I0709 00:46:11.289136 11002 slave.cpp:837] Launching task Task_Tracker_242 > for framework 201307040929-252063498-5050-27411-0000 > I0709 00:46:11.291296 11002 paths.hpp:303] Created executor directory > > '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe > cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d' > I0709 00:46:11.291647 11002 slave.cpp:948] Queuing task 'Task_Tracker_242' > for executor executor_Task_Tracker_242 of framework > '201307040929-252063498-5050-27411-0000 > I0709 00:46:11.292162 11002 slave.cpp:511] Successfully attached file > > '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe > cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d' > W0709 00:46:12.197242 10992 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:16.100548 10994 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:17.197463 11001 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:21.101570 11002 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:22.198303 11005 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:26.102522 11002 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:27.199403 10998 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:31.103610 10998 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:32.200248 11001 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:36.104547 11004 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:37.201236 10991 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:41.105523 10997 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:42.202250 10991 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > I0709 00:46:45.283098 11002 slave.cpp:2511] Current usage 57.43%. Max > allowed age: 2.279812884766227days > W0709 00:46:46.106760 10994 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:47.203474 10993 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:51.107544 11006 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:52.204280 10997 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:56.108530 10995 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:46:57.205417 10997 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:47:01.109284 10997 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:47:02.206368 11002 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > I0709 00:47:05.288517 11002 slave.cpp:2463] Terminating executor > executor_Task_Tracker_238 of framework > 201307040929-252063498-5050-27411-0000 because it did not register within > 1mins > W0709 00:47:06.110532 11005 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:47:07.207320 10997 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_230' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > W0709 00:47:11.111778 10996 monitor.cpp:186] Failed to collect resource > usage for executor 'executor_Task_Tracker_224' of framework > '201307040929-252063498-5050-27411-0000': Future discarded > I0709 00:47:11.292485 10991 slave.cpp:2463] Terminating executor > executor_Task_Tracker_242 of framework > 201307040929-252063498-5050-27411-0000 because it did not register within > 1mins > > > Guodong > > > On Tue, Jul 9, 2013 at 2:21 PM, Vinod Kone <vinodk...@gmail.com> wrote: > > > hey guodong, which of these task(s) is stuck in STAGING? also, the > > corresponding master's logs would also be helpful here. also which > version > > of mesos are you running? > > > > > > On Mon, Jul 8, 2013 at 11:02 PM, 王国栋 <wangg...@gmail.com> wrote: > > > > > It is very interesting that there are these logs. > > > > > > I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 > > > I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 > > > I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status update > > > TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 > > > from executor(1)@10.47.6.21:27786 > > > I0709 00:33:43.973132 10994 status_update_manager.cpp:290] Received > > status > > > update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for > > task > > > Task_Tracker_230 of framework 201307040929-252063498-5 > > > 050-27411-0000 with checkpoint=false > > > I0709 00:33:43.973192 10994 status_update_manager.cpp:336] Forwarding > > > status update TASK_FINISHED (UUID: > 372081cc-edf2-4183-a461-9345ab6d279c) > > > for task Task_Tracker_230 of framework 201307040929-252063498 > > > -5050-27411-0000 to master@10.47.6.15:5050 > > > I0709 00:33:43.973846 11005 slave.cpp:1809] Sending acknowledgement for > > > status update TASK_FINISHED (UUID: > 372081cc-edf2-4183-a461-9345ab6d279c) > > > for task Task_Tracker_230 of framework 201307040929-2520634 > > > 98-5050-27411-0000 to executor(1)@10.47.6.21:27786 > > > I0709 00:33:43.974591 11000 status_update_manager.cpp:360] Received > > status > > > update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for task > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27 > > > 411-0000 > > > I0709 00:33:43.974652 11000 status_update_manager.cpp:481] Cleaning up > > > status update stream for task Task_Tracker_230 of framework > > > 201307040929-252063498-5050-27411-0000 > > > I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status update > > > TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 > > > from executor(1)@10.47.6.21:2310 > > > I0709 00:33:44.090860 11003 status_update_manager.cpp:290] Received > > status > > > update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for > > task > > > Task_Tracker_224 of framework 201307040929-252063498-5 > > > 050-27411-0000 with checkpoint=false > > > I0709 00:33:44.090973 11003 status_update_manager.cpp:336] Forwarding > > > status update TASK_FINISHED (UUID: > 61d5775a-2375-412a-a5a4-80ab55163d88) > > > for task Task_Tracker_224 of framework 201307040929-252063498 > > > -5050-27411-0000 to master@10.47.6.15:5050 > > > I0709 00:33:44.091279 11003 slave.cpp:1809] Sending acknowledgement for > > > status update TASK_FINISHED (UUID: > 61d5775a-2375-412a-a5a4-80ab55163d88) > > > for task Task_Tracker_224 of framework 201307040929-2520634 > > > 98-5050-27411-0000 to executor(1)@10.47.6.21:2310 > > > I0709 00:33:44.093286 11003 status_update_manager.cpp:360] Received > > status > > > update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for task > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27 > > > 411-0000 > > > I0709 00:33:44.093359 11003 status_update_manager.cpp:481] Cleaning up > > > status update stream for task Task_Tracker_224 of framework > > > 201307040929-252063498-5050-27411-0000 > > > I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%. Max > > > allowed age: 2.279168852469954days > > > W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_224' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_230' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_224' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_230' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_224' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_230' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_224' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_230' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_224' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_230' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_224' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_230' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_224' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect resource > > > usage for executor 'executor_Task_Tracker_230' of framework > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > > > > > > > > > Guodong > > > > > > > > > On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wangg...@gmail.com> wrote: > > > > > > > Hi ben, > > > > > > > > I ran into the same issue here. > > > > > > > > This also happens in our hadoop framework. The slave log is like > these. > > > At > > > > that time, I think the work load of the node is very high. > > > > > > > > I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task > > > > Task_Tracker_224 for framework 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling > > > > > > > > > > '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000' > > > > for removal > > > > I0708 23:36:44.256206 11001 slave.cpp:837] Launching task > > > Task_Tracker_224 > > > > for framework 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:44.258117 11001 paths.hpp:303] Created executor directory > > > > > > > > > > '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe > > > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1' > > > > I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching > > > > executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in > > > > > > > > > > /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew > > > > > > > > > > orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1 > > > > with resources cpus=1; mem=1280' for framework > > > > 201307040929-252063498-5050-27411-0 > > > > 000 > > > > I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task > > > 'Task_Tracker_224' > > > > for executor executor_Task_Tracker_224 of framework > > > > '201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked executor > > at > > > > 2220 > > > > I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached file > > > > > > > > > > '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe > > > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1' > > > > I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%. Max > > > > allowed age: 2.295155852123924days > > > > I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for > > executor > > > > 'executor_Task_Tracker_224' of framework > > > > 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task > > > > Task_Tracker_224 for executor 'executor_Task_Tracker_224' of > framework > > > > 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update > > > > TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task > > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 > f > > > > rom executor(1)@10.47.6.21:2310 > > > > I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received > > > status > > > > update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for > > task > > > > Task_Tracker_224 of framework 201307040929-252063498-50 > > > > 50-27411-0000 with checkpoint=false > > > > I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating > > > > StatusUpdate stream for task Task_Tracker_224 of framework > > > > 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:46.145076 11006 status_update_manager.cpp:336] Forwarding > > > > status update TASK_RUNNING (UUID: > 364ee347-f6a2-4c7b-8702-460aa0ece579) > > > for > > > > task Task_Tracker_224 of framework 201307040929-252063498- > > > > 5050-27411-0000 to master@10.47.6.15:5050 > > > > I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement > for > > > > status update TASK_RUNNING (UUID: > 364ee347-f6a2-4c7b-8702-460aa0ece579) > > > for > > > > task Task_Tracker_224 of framework 201307040929-25206349 > > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:2310 > > > > I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received > > > status > > > > update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for task > > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27 > > > > 411-0000 > > > > I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task > > > > Task_Tracker_230 for framework 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:50.259472 11005 slave.cpp:837] Launching task > > > Task_Tracker_230 > > > > for framework 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:50.261641 11005 paths.hpp:303] Created executor directory > > > > > > > > > > '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe > > > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd' > > > > I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task > > > 'Task_Tracker_230' > > > > for executor executor_Task_Tracker_230 of framework > > > > '201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching > > > > executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in > > > > > > > > > > /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew > > > > > > > > > > orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd > > > > with resources cpus=1; mem=1280' for framework > > > > 201307040929-252063498-5050-27411-0 > > > > 000 > > > > I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached file > > > > > > > > > > '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe > > > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd' > > > > I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked executor > > at > > > > 2851 > > > > I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for > > executor > > > > 'executor_Task_Tracker_230' of framework > > > > 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task > > > > Task_Tracker_230 for executor 'executor_Task_Tracker_230' of > framework > > > > 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update > > > > TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task > > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 > f > > > > rom executor(1)@10.47.6.21:27786 > > > > I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received > > > status > > > > update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for > > task > > > > Task_Tracker_230 of framework 201307040929-252063498-50 > > > > 50-27411-0000 with checkpoint=false > > > > I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating > > > > StatusUpdate stream for task Task_Tracker_230 of framework > > > > 201307040929-252063498-5050-27411-0000 > > > > I0708 23:36:54.618443 10994 status_update_manager.cpp:336] Forwarding > > > > status update TASK_RUNNING (UUID: > 7753252d-c90b-4b0d-adca-7c97f38f692e) > > > for > > > > task Task_Tracker_230 of framework 201307040929-252063498- > > > > 5050-27411-0000 to master@10.47.6.15:5050 > > > > I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement > for > > > > status update TASK_RUNNING (UUID: > 7753252d-c90b-4b0d-adca-7c97f38f692e) > > > for > > > > task Task_Tracker_230 of framework 201307040929-25206349 > > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:27786 > > > > I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received > > > status > > > > update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for task > > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27 > > > > 411-0000 > > > > I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%. Max > > > > allowed age: 2.293704423241597days > > > > I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%. Max > > > > allowed age: 2.293703916528542days > > > > I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%. Max > > > > allowed age: 2.293639867998055days > > > > I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%. Max > > > > allowed age: 2.292921551567535days > > > > I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%. Max > > > > allowed age: 2.291521098018820days > > > > I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%. Max > > > > allowed age: 2.293668041244063days > > > > I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%. Max > > > > allowed age: 2.292935638190544days > > > > I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%. Max > > > > allowed age: 2.292916079066516days > > > > I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%. Max > > > > allowed age: 2.291485324076945days > > > > I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%. Max > > > > allowed age: 2.293641894850289days > > > > I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%. Max > > > > allowed age: 2.293629429709074days > > > > I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%. Max > > > > allowed age: 2.293525350847025days > > > > I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%. Max > > > > allowed age: 2.292909289111539days > > > > I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%. Max > > > > allowed age: 2.291438098419977days > > > > I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%. Max > > > > allowed age: 2.293635104895313days > > > > I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%. Max > > > > allowed age: 2.292983775931019days > > > > I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%. Max > > > > allowed age: 2.286910009194236days > > > > I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%. Max > > > > allowed age: 2.286909502481169days > > > > I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%. Max > > > > allowed age: 2.284414244700093days > > > > I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%. Max > > > > allowed age: 2.280636901540567days > > > > I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%. Max > > > > allowed age: 2.279481899796968days > > > > I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%. Max > > > > allowed age: 2.276566475548496days > > > > I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%. Max > > > > allowed age: 2.275690368671817days > > > > I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%. Max > > > > allowed age: 2.275057180034989days > > > > I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%. Max > > > > allowed age: 2.273999467198449days > > > > I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%. Max > > > > allowed age: 2.273472384275891days > > > > I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%. Max > > > > allowed age: 2.282894612240220days > > > > I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%. Max > > > > allowed age: 2.281966516603831days > > > > I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%. Max > > > > allowed age: 2.281962260214144days > > > > I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%. Max > > > > allowed age: 2.281791801941551days > > > > I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%. Max > > > > allowed age: 2.281715288269849days > > > > I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%. Max > > > > allowed age: 2.281699782850289days > > > > I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%. Max > > > > allowed age: 2.280776044946192days > > > > I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%. Max > > > > allowed age: 2.280772193926956days > > > > I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%. Max > > > > allowed age: 2.279204525069213days > > > > I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%. Max > > > > allowed age: 2.277132676719109days > > > > I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%. Max > > > > allowed age: 2.280012428368322days > > > > I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%. Max > > > > allowed age: 2.276733690857512days > > > > I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%. Max > > > > allowed age: 2.272715152282546days > > > > I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%. Max > > > > allowed age: 2.270354274804352days > > > > I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%. Max > > > > allowed age: 2.266927678423322days > > > > I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%. Max > > > > allowed age: 2.264218182361482days > > > > I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%. Max > > > > allowed age: 2.261509598383137days > > > > I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%. Max > > > > allowed age: 2.259379478031400days > > > > I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%. Max > > > > allowed age: 2.255819920144039days > > > > I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%. Max > > > > allowed age: 2.253314528101817days > > > > I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%. Max > > > > allowed age: 2.250524870034248days > > > > I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%. Max > > > > allowed age: 2.246784618270532days > > > > I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%. Max > > > > allowed age: 2.242399422127049days > > > > I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%. Max > > > > allowed age: 2.240250654734792days > > > > I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%. Max > > > > allowed age: 2.240516983117894days > > > > I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%. Max > > > > allowed age: 2.235834143724352days > > > > I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%. Max > > > > allowed age: 2.233297436815162days > > > > W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_230' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_224' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_230' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_224' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_230' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_224' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_230' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_224' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_230' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_224' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_230' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_224' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_230' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_224' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect > resource > > > > usage for executor 'executor_Task_Tracker_230' of framework > > > > '201307040929-252063498-5050-27411-0000': Future discarded > > > > I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%. Max > > > > allowed age: 2.232469873049410days > > > > > > > > > > > > Guodong > > > > > > > > > > > > On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler < > > > benjamin.mah...@gmail.com > > > > > wrote: > > > > > > > >> Are these the un-edited logs? I'm expecting to see some logs from > the > > > >> process_isolator or cgroups_isolator in there. > > > >> > > > >> > > > >> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews < > > > >> brenden.matth...@airbedandbreakfast.com> wrote: > > > >> > > > >> > Hey guys, > > > >> > > > > >> > I'm currently having a problem where tasks will get stuck in the > > > staging > > > >> > state, though according to the logs they should have been > > terminated. > > > >> They > > > >> > hang indefinitely, or until I restart the slave. Below is a > > > screenshot > > > >> + > > > >> > logs. Also interesting is the 'Failed to collect resource usage > > ...' > > > >> > messages. > > > >> > > > > >> > [image: Inline image 2] > > > >> > > > > >> > I0705 16:19:51.551512 9706 slave.cpp:739] Got assigned task > > > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio > > > >> >> n for framework chronos > > > >> >> I0705 16:19:51.552150 9706 slave.cpp:837] Launching task > > > >> >> > ct:1373041190990:0:add_latest_reservation_survey_events_partition f > > > >> >> or framework chronos > > > >> >> I0705 16:19:51.553956 9706 paths.hpp:303] Created executor > > directory > > > >> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1 > > > >> >> > > > >> >> > > > >> > > > > > > 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c > > > >> >> f2-4d1ce60d618f' > > > >> >> I0705 16:19:51.554576 9706 slave.cpp:948] Queuing task > > > >> >> > > 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f > > > >> >> or executor > > > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition > > of > > > >> >> framework 'c > > > >> >> hronos > > > >> >> I0705 16:19:51.555027 9706 slave.cpp:511] Successfully attached > > file > > > >> >> > > > >> > > > > > > '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f' > > > >> >> I0705 16:19:54.048754 9724 slave.cpp:2530] Current usage 42.18%. > > Max > > > >> >> allowed age: 22.955009563956388hrs > > > >> >> W0705 16:19:54.108963 9724 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:19:59.110787 9729 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:04.112406 9704 monitor.cpp:186] Failed to collect > > > >> resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:09.114367 9705 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:14.116312 9706 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:19.118370 9699 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:24.120311 9701 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:29.122355 9700 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:34.123443 9722 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:39.125660 9718 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:44.127464 9724 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:49.129385 9725 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> I0705 16:20:51.555174 9703 slave.cpp:2482] Terminating executor > > > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition > > of > > > >> >> framework chronos because it did not register within 1mins > > > >> >> I0705 16:20:54.050434 9717 slave.cpp:2530] Current usage 42.18%. > > Max > > > >> >> allowed age: 22.955009342481944hrs > > > >> >> W0705 16:20:54.130730 9699 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:20:59.132472 9702 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:21:04.134557 9713 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> >> W0705 16:21:09.135619 9701 monitor.cpp:186] Failed to collect > > > resource > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded > > > >> > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > >