logs? Also what version of mesos? @vinodkone Sent from my mobile
On May 15, 2013, at 12:00 AM, 王瑜 <[email protected]> wrote: > Hi Ben, > > I think the problem is mesos have found the executor on > hdfs://master/user/mesos/hadoop.tar.gz, but it did not download it, so did > not use it. > Mesos found the executor, so it did not output error, just update the task > status as lost; but mesos did not use the executor, so the executor directory > contains nothing! > > But I am not very familiar with source code, so I do not know why mesos can > not use the executor. And I also do not know whether my analysis is right. > Thanks very much for your help! > > > > > Wang Yu > > 发件人: 王瑜 > 发送时间: 2013-05-15 11:04 > 收件人: mesos-dev > 抄送: Benjamin Mahler > 主题: 回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > TaskTracker: http://slave5:50060 > Hi, Ben, > > I have reworked the test, and checked log directory again, it is still null. > The same as following. > I think there is the problem with my executor, but I do not know how to let > the executor works. Logs is as following... > " Asked to update resources for an unknown/killed executor" why it always > kill the executor? > > 1. I opened all the executor directory, but all of them are null. I do not > know what happened to them... > [root@slave1 logs]# cd > /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l > 总用量 0 > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a > . .. > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# > 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are > always lost. But there is no error any more, I still do not know what > happened to the executor...Logs on one slave is as follows. Please help me, > thanks very much! > > mesos-slave.INFO > Log file created at: 2013/05/13 09:12:54 > Running on machine: slave1 > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg > I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator > I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root > I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave > I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on > 1)@192.168.0.3:36668 > I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; > mem=63356; ports=[31000-32000]; disk=29143 > I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as > cgroups hierarchy root > I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at > [email protected]:5050 > I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file > '/home/mesos/build/logs/mesos-slave.INFO' > I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master > detected at [email protected]:5050 > I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator > I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery > I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given > slave ID 201305130913-33597632-5050-3893-0 > I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling > '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal > I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling > '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal > I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling > '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal > I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling > '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal > I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling > '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal > I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling > '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal > I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling > '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal > I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max > allowed age: 5.11days > I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max > allowed age: 5.11days > I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max > allowed age: 5.11days > I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 > for framework 201305130913-33597632-5050-3893-0000 > I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory > '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' > I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file > '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' > I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching > executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in > /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 > with resources cpus=1; mem=1280 for framework > 201305130913-33597632-5050-3893-0000 in cgroup > mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup > controls for executor executor_Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280 > I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares' to > 1024 for executor executor_Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated > 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of > framework 201305130913-33597632-5050-3893-0000 > I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated > 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of > framework 201305130913-33597632-5050-3893-0000 > I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening for > OOM events for executor executor_Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at = > 24552 > I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task Task_Tracker_1 > for framework 201305130913-33597632-5050-3893-0000 > I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory > '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' > I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file > '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' > I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching > executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in > /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b > with resources cpus=1; mem=1280 for framework > 201305130913-33597632-5050-3893-0000 in cgroup > mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b > I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup > controls for executor executor_Task_Tracker_1 of framework > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280 > I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares' to > 1024 for executor executor_Task_Tracker_1 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated > 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_1 of > framework 201305130913-33597632-5050-3893-0000 > I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening for > OOM events for executor executor_Task_Tracker_1 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at = > 24628 > I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor > executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 > terminated with status 256 > I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor > executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 > I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is > triggered for executor executor_Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 with uuid > 6522748a-9d43-41b7-8f88-cd537a502495 > I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM notifier > for executor executor_Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 with uuid > 6522748a-9d43-41b7-8f88-cd537a502495 > I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup > /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup > /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > after 1 attempts > I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup > /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed > /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully destroyed > cgroup > mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > I0513 09:16:34.477439 24190 slave.cpp:1479] Executor > 'executor_Task_Tracker_0' of framework 201305130913-33597632-5050-3893-0000 > has exited with status 1 > I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update TASK_LOST > from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 > I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update > TASK_LOST from task Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 to the status update manager > I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update > resources for an unknown/killed executor > I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status > update TASK_LOST from task Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating > StatusUpdate stream for task Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE > for status update TASK_LOST from task Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding status > update TASK_LOST from task Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 to the master at [email protected]:5050 > I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status > update for task Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status > update acknowledgement for task Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK for > status update TASK_LOST from task Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > I0513 09:16:34.487547 24185 status_upda
