Now that you've uploaded the executor, can you send us the master / slave logs? When looking at a slave, can you look at an executor run directory to see what's in stderr?
For example, in the slave you'll see a log line like the following: I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_4 (cd hadoop && ./bin/ mesos-executor) in /tmp/mesos/slaves/201305130913-33597632- 5050-3893-0/frameworks/201305130913-33597632-5050- 3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631- 1ec0-4946-a1bc-0644a7238e3c with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913- 33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0- 4946-a1bc-0644a7238e3c Based on the above, you'll want to check out what's inside the directory: $ cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/ 201305130913-33597632-5050-3893-0000/executors/executor_ Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c $ ls $ cat stderr Thanks! On Sun, May 12, 2013 at 8:45 PM, 王瑜 <[email protected]> wrote: > Yes, I also updated mapred-site.xml. But it still can not work. > > I am using git version, just download it using git clone git:// > git.apache.org/mesos.git > > $ cd mesos > $ ./bootstrap > $ ./configure > $ make > $ cd hadoop > $ make hadoop-0.20.205.0 > > Then deploy it on the real cluster. > > I really do not know where is the problem, please help me with it. > > > > > Wang Yu > > 发件人: Vinod Kone > 发送时间: 2013-05-13 11:30 > 收件人: [email protected] > 抄送: mesos-dev > 主题: Re: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > TaskTracker: http://slave5:50060 > Hmm. You definitely need the right extension but not the "hadoop" name. In > assuming you also updated the file name in mapred-site.xml? > > Also I'm surprised that the slave logs donot show info about downloading > the executor. What version of mesos are you running? > > @vinodkone > Sent from my mobile > > On May 12, 2013, at 7:59 PM, 王瑜 <[email protected]> wrote: > > > I have uploaded the right file using: > > [root@master hadoop-0.20.205.0]# hadoop fs -mkdir mesos > > [root@master hadoop-0.20.205.0]# hadoop fs -copyFromLocal > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz > /user/mesos/mesos-executor > > > > I have tried add file extension--" /user/mesos/mesos-executor"->" > /user/mesos/mesos-executor.tar.gz", but it still can not work. Does it must > using hadoop.tar.gz as the file name? > > > > > > > > > > Wang Yu > > > > 发件人: Vinod Kone > > 发送时间: 2013-05-13 10:42 > > 收件人: [email protected]; wangyu > > 抄送: Benjamin Mahler > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > TaskTracker: http://slave5:50060 > >> > >> <property> > >> <name>mapred.mesos.executor</name> > >> # <value>hdfs://hdfs.name.node:port/hadoop.zip</value> > >> <value>hdfs://master/user/mesos/mesos-executor</value> > >> </property> > > > > the mapred.mesos.executor property looks incorrect. the value should be > > where you have uploaded the "hadoop.tar.gz" bundle generated by the > > (TUTORIAL.sh or make hadoop). you can find the generated "hadoop.tar.gz" > > bundle in the hadoop build directory. upload the bundle to a hdfs > location > > and set the above property to that location. > > > > vinod > > > > > > > >> > /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c > >>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls > >>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l > >>> 总用量 0 > >>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a > >>> . .. > >>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# > >>> 2. I added "--isolation=cgroups" for slaves, but it still not work. > Tasks > >>> are always lost. But there is no error any more, I still do not know > what > >>> happened to the executor...Logs on one slave is as follows. Please help > >> me, > >>> thanks very much! > >>> > >>> mesos-slave.INFO > >>> Log file created at: 2013/05/13 09:12:54 > >>> Running on machine: slave1 > >>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg > >>> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator > >>> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by > >>> root > >>> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave > >>> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@ > >>> 192.168.0.3:36668 > >>> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; > >>> mem=63356; ports=[31000-32000]; disk=29143 > >>> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as > >>> cgroups hierarchy root > >>> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at > >>> [email protected]:5050 > >>> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file > >>> '/home/mesos/build/logs/mesos-slave.INFO' > >>> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master > >>> detected at [email protected]:5050 > >>> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering > isolator > >>> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery > >>> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; > given > >>> slave ID 201305130913-33597632-5050-3893-0 > >>> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling > >>> '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal > >>> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling > >>> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal > >>> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling > >>> '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal > >>> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling > >>> '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal > >>> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling > >>> '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal > >>> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling > >>> '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal > >>> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling > >>> '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal > >>> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. > >> Max > >>> allowed age: 5.11days > >>> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. > >> Max > >>> allowed age: 5.11days > >>> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. > >> Max > >>> allowed age: 5.11days > >>> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task > >>> Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory > >> > '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' > >>> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file > >> > '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' > >>> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching > >>> executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in > >> > /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 > >>> with resources cpus=1; mem=1280 for framework > >>> 201305130913-33597632-5050-3893-0000 in cgroup > >> > mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > >>> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup > >>> controls for executor executor_Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280 > >>> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated > >> 'cpu.shares' > >>> to 1024 for executor executor_Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated > >>> 'memory.limit_in_bytes' to 1342177280 for executor > >> executor_Task_Tracker_0 > >>> of framework 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated > >>> 'memory.limit_in_bytes' to 1342177280 for executor > >> executor_Task_Tracker_0 > >>> of framework 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started > listening > >>> for OOM events for executor executor_Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor > at > >> = > >>> 24552 > >>> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task > >>> Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory > >> > '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' > >>> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file > >> > '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' > >>> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching > >>> executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in > >> > /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b > >>> with resources cpus=1; mem=1280 for framework > >>> 201305130913-33597632-5050-3893-0000 in cgroup > >> > mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b > >>> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup > >>> controls for executor executor_Task_Tracker_1 of framework > >>> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280 > >>> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated > >> 'cpu.shares' > >>> to 1024 for executor executor_Task_Tracker_1 of framework > >>> 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated > >>> 'memory.limit_in_bytes' to 1342177280 for executor > >> executor_Task_Tracker_1 > >>> of framework 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started > listening > >>> for OOM events for executor executor_Task_Tracker_1 of framework > >>> 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor > at > >> = > >>> 24628 > >>> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor > >>> executor_Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > >>> terminated with status 256 > >>> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor > >>> executor_Task_Tracker_0 of framework > 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is > >>> triggered for executor executor_Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 with uuid > >>> 6522748a-9d43-41b7-8f88-cd537a502495 > >>> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM > >>> notifier for executor executor_Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 with uuid > >>> 6522748a-9d43-41b7-8f88-cd537a502495 > >>> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup > >> > /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > >>> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup > >> > /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > >>> after 1 attempts > >>> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup > >> > /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > >>> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed > >> > /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > >>> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully > >>> destroyed cgroup > >> > mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 > >>> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor > >>> 'executor_Task_Tracker_0' of framework > >> 201305130913-33597632-5050-3893-0000 > >>> has exited with status 1 > >>> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update > >>> TASK_LOST from task Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update > >>> TASK_LOST from task Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 to the status update manager > >>> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update > >>> resources for an unknown/killed executor > >>> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received > >> status > >>> update TASK_LOST from task Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating > >>> StatusUpdate stream for task Task_Tracker_0 of framework > >>> 201305130913-33597632-5050-3893-0000 > >>> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling > >> UPDATE > >>> for status update TASK_ >
