1. If you look at a slave log, you can see that the process isolator launched the task and then notified the slave that it was lost. Can you look inside one of the executor directories, there should be an stderr file there. E.g.:
I0510 09:44:33.801655 7412 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163' Look for these in the logs and read the stderr present inside. Can you report back with the contents? 2. Are you running on Linux? You may want to consider using --isolation=cgroups when starting your slaves. This uses linux control groups to do process / cpu / memory isolation between executors running on the slave. Thanks! On Thu, May 9, 2013 at 7:07 PM, 王瑜 <[email protected]> wrote: > ** > Hi Ben, > > Logs for mesos master and slaves are attached, thanks for helping me with > this problem. I am very appreciate for your patient reply. > > Three servers: "master", "slave1", "slave5" > Mesos master: "master" > Mesos slaves: "master", "slave1", "slave5" > > ------------------------------ > Wang Yu > > *发件人:* Benjamin Mahler <[email protected]> > *发送时间:* 2013-05-10 07:22 > *收件人:* wangyu <[email protected]> > *抄送:* mesos-dev <[email protected]>; Benjamin > Mahler<[email protected]> > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > TaskTracker: http://slave5:50060 > Ah I see them now, looks like you uploaded the NameNode logs? Can you > upload the mesos-master and mesos-slave logs instead? What will be > interesting here is what happened on the slave that is trying to run the > TaskTracker. > > > On Wed, May 8, 2013 at 8:32 PM, 王瑜 <[email protected]> wrote: > > > ** > > > I have uploaded them in the former email, I will send them again. PS: Will > > the email list reject the attachements? > > > > Can you see them? > > > > ------------------------------ > > Wang Yu > > > > *发件人:* Benjamin Mahler <[email protected]> > > *发送时间:* 2013-05-09 10:00 > > *收件人:* [email protected]; wangyu <[email protected]> > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > > TaskTracker: http://slave5:50060 > > Did you forget to attach them? > > > > > > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <[email protected]> wrote: > > > > > ** > > > OK. > > > Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost > > > happened. > > > > > > Thanks very much for your help! > > > > > > ------------------------------ > > > Wang Yu > > > > > > *发件人:* Benjamin Mahler <[email protected]> > > > *发送时间:* 2013-05-09 01:23 > > > *收件人:* [email protected] > > > *抄送:* wangyu <[email protected]> > > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > > > TaskTracker: http://slave5:50060 > > > > > > > > > Hey Brenden, are there any bugs in particular here that you're referring > > > to? > > > > > > Wang, can you provide the logs for the JobTracker, the slave, and the > > > master? > > > > > > > > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews < > > > [email protected]> wrote: > > > > > > > You may want to try Airbnb's dist of Mesos: > > > > > > > > https://github.com/airbnb/mesos/tree/testing > > > > > > > > > A good number of these Mesos bugs have been fixed but aren't yet merged > > > > into upstream. > > > > > > > > > > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <[email protected]> wrote: > > > > > > > > > > > > The log on each slave of the lost task is : No executor found with ID: > > > > > executor_Task_Tracker_XXX. > > > > > > > > > > > > > > > > > > > > > > > > > Wang Yu > > > > > > > > > > 发件人: 王瑜 > > > > > 发送时间: 2013-05-07 11:13 > > > > > 收件人: mesos-dev > > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > > > > > TaskTracker: http://slave5:50060 > > > > > Hi all, > > > > > > > > > > > > > I have tried adding file extension when upload executor as well as the > > > > > conf file, but it still can not work. > > > > > > > > > > And I have seen > > > > > > > > > > > > > > > /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest, > > > > > but it is a null directory. > > > > > > > > > > > > > > > > Is there any other logs I can read to know why the TASK_LOST > > > > > happened? I > > > > > really need your help, thanks very much! > > > > > > > > > > > > > > > > > > > > > > > > > Wang Yu > > > > > > > > > > 发件人: Vinod Kone > > > > > 发送时间: 2013-04-26 01:31 > > > > > 收件人: [email protected] > > > > > 抄送: wangyu > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > > > > > TaskTracker: http://slave5:50060 > > > > > Also, you could look at the executor logs (default: > > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the > > > > > TASK_LOST happened. > > > > > > > > > > > > > > > > > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler < > > > > > [email protected]> wrote: > > > > > > > > > > > > > Can you maintain the file extension? That is how mesos knows to > > > > > extract > > > > it: > > > > > hadoop fs -copyFromLocal > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz > > > > > /user/mesos/mesos-executor.tar.gz > > > > > > > > > > Also make sure your mapred-site.xml has the extension as well. > > > > > > > > > > > > > > > > > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <[email protected] > > wrote: > > > > > > > > > > > Hi, Ben, > > > > > > > > > > > > I have tried as you said, but It still can not work. > > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal > > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz > > > > > > /user/mesos/mesos-executor > > > > > > Did I do the right thing? Thanks very much! > > > > > > > > > > > > The log in jobtracker is: > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task > > > > > > Task_Tracker_82 on http://slave1:31000 > > > > > > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and > > > > > > reduce > > > > > > slots needed. > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of > > > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status > > > > > > Pending Map Tasks: 2 > > > > > > Pending Reduce Tasks: 1 > > > > > > Idle Map Slots: 0 > > > > > > Idle Reduce Slots: 0 > > > > > > Inactive Map Slots: 6 (launched but no hearbeat yet) > > > > > > Inactive Reduce Slots: 6 (launched but no hearbeat yet) > > > > > > Needed Map Slots: 2 > > > > > > Needed Reduce Slots: 1 > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task > > > > > > Task_Tracker_83 on http://slave1:31000 > > > > > > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and > > > > > > reduce > > > > > > slots needed. > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of > > > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated > > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status > > > > > > Pending Map Tasks: 2 > > > > > > Pending Reduce Tasks: 1 > > > > > > Idle Map Slots: 0 > > > > > > Idle Reduce Slots: 0 > > > > > > Inactive Map Slots: 6 (launched but no hearbeat yet) > > > > > > Inactive Reduce Slots: 6 (launched but no hearbeat yet) > > > > > > Needed Map Slots: 2 > > > > > > Needed Reduce Slots: 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Wang Yu > > > > > > > > > > > > 发件人: Benjamin Mahler > > > > > > 发送时间: 2013-04-24 07:49 > > > > > > 收件人: [email protected]; wangyu > > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited > > > > > > TaskTracker: http://slave5:50060 > > > > > > > > > > > > You need to instead upload the hadoop.tar.gz generated by the > > > > > > tutorial. > > > > > > > > > > > > Then point the conf file to the hdfs directory (you had the right > > > > > > idea, > > > > > > just uploaded the wrong file). :) > > > > > > > > > > > > Can you try that and report back? > > > > > > > > > > > > > > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <[email protected] > > > wrote: > > > > > > > > > > > > > Guodong, > > > > > > > > > > > > > > > > > > > > There still are problems with me, I think there are some problem > > > > > > > with > > > > > my > > > > > > > executor setting. > > > > > > > > > > > > > > In mapred-site.xml, I set:("master" is the hostname of > > > > > > > mesos-master-hostname) > > > > > > > <property> > > > > > > > <name>mapred.mesos.executor</name> > > > > > > > # <value>hdfs://hdfs.name.node:port/hadoop.zip</value> > > > > > > > <value>hdfs://master/user/mesos/mesos-executor</value> > > > > > > > </property> > > > > > > > > > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor > > > > > > > > > > > > > > The head content is as follows: > > > > > > > > > > > > > > #! /bin/sh > > > > > > > > > > > > > > > > > > > > # mesos-executor - temporary wrapper script for > > > > > > > .libs/mesos-executor > > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b > > > > > > > # > > > > > > > > > > # The mesos-executor program cannot be directly executed until all > > > > the > > > > > > > libtool > > > > > > > # libraries that it depends on are installed. > > > > > > > # > > > > > > > # This wrapper script should never be moved out of the build > > > > directory. > > > > > > > # If it is, it will not operate correctly. > > > > > > > > > > > > > > # Sed substitution that helps us do robust quoting. It > > > > backslashifies > > > > > > > > > > > > > # metacharacters that are still active within double-quoted > > > > > > > strings. > > > > > > > Xsed='/bin/sed -e 1s/^X//' > > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g' > > > > > > > > > > > > > > # Be Bourne compatible > > > > > > > > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; > > > > > > > then > > > > > > > emulate sh > > > > > > > NULLCMD=: > > > > > > > # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which > > > > > > > # is contrary to our usage. Disable this feature. > > > > > > > alias -g '${1+"$@"}'='"$@"' > > > > > > > setopt NO_GLOB_SUBST > > > > > > > else > > > > > > > case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac > > > > > > > fi > > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64 > > > > > > > DUALCASE=1; export DUALCASE # for MKS sh > > > > > > > > > > > > > > > > > > > > # The HP-UX ksh and POSIX shell print the target directory to > > > > > > > stdout > > > > > > > # if CDPATH is set. > > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH > > > > > > > > > > > > > > relink_command="(cd /home/mesos/build/src; { test -z > > > > > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=; > > > > > export > > > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset > > > > > > > > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { > > > > > > > test > > > > > -z > > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || { > > > > > > GCC_EXEC_PREFIX=; > > > > > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" || > > > > > unset > > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; }; > > > > > > > > > > > > > > > > > > > > > > > > > > > > LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/; > > > > > > > export LD_LIBRARY_PATH; > > > > > > > > > > > > > > > > > > > > > > > > > > > > PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin; > > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file > > > > > > > launcher/mesos_executor-executor.o ./.libs/libmesos.so > > > > > > > > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl > > > > > > > -lssl > > > > > > > > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath > > > > > > > -Wl,/home/mesos/build/src/.libs > > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)" > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > Did I upload the right file? and set up it in conf file correct? > > > > Thanks > > > > > > > very much! > > > > > > > > > > > > > > > > > > > > > > > > > > > > Wang Yu > > > > > > > > > > > > > > From: 王国栋 > > > > > > > Date: 2013-04-23 13:32 > > > > > > > To: wangyu > > > > > > > CC: mesos-dev > > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler: > > > > > Unknown/exited > > > > > > > TaskTracker: http://slave5:50060 > > > > > > > Hmm. it seems that the mapred.mesos.master is set correctly. > > > > > > > > > > > > > > > if you run hadoop in local mode, use the following setting is ok > > > > > > > <property> > > > > > > > <name>mapred.mesos.master</name> > > > > > > > <value>local</value> > > > > > > > </property> > > > > > > > > > > > > > > > if you want to start the cluster. set mapred.mesos.master as the > > > > > > > mesos-master-hostname:mesos-master-port. > > > > > > > > > > > > > > > Make sure the dns parser result for mesos-master-hostname is the > > > > right > > > > > > ip. > > > > > > > > > > > > > > > > > > > > BTW: when you starting the jobtracker, you can check mesos webUI > > > > > > > and > > > > > > check > > > > > > > if there is hadoop framework registered. > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > Guodong > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <[email protected] > > > > wrote: > > > > > > > > > > > > > > > ** > > > > > > > > Hi, Guodong, > > > > > > > > > > > > > > > > I start hadoop as you said, then I saw this error: > > > > > > > > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from > > > > > > > > scheduler > > > > > > > driver: Cannot parse > > > > > > > > '@0.0.0.0:0' > > > > > > > > > > > > > > > > > > > > > > What's this mean? where should I change MesosScheduler code to > > > > > > > > fix > > > > > > this? > > > > > > > > > Thanks very much! I am so sorry for interrupt you once again... > > > > > > > > > > > > > > > > The whole log is as follows: > > > > > > > > > > > > > > > > [root@master hadoop-0.20.205.0]# hadoop jobtracker > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG: > > > > > > > > /************************************************************ > > > > > > > > STARTUP_MSG: Starting JobTracker > > > > > > > > STARTUP_MSG: host = master/192.168.0.2 > > > > > > > > STARTUP_MSG: args = [] > > > > > > > > STARTUP_MSG: version = 0.20.205.0 > > > > > > > > > > > > > > > > STARTUP_MSG: build = -r ; compiled by 'root' on Sat Apr 13 > > > > > 11:19:33 > > > > > > > CST 2013 > > > > > > > > ************************************************************/ > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties > > > > > > > > from > > > > > > > hadoop-metrics2.properties > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for > > > > > > > > source > > > > > > > MetricsSystem,sub=Stats registered. > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled > > > > > > > > snapshot > > > > > > period > > > > > > > at 10 second(s). > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker > > > > > > > > metrics > > > > > > system > > > > > > > started > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for > > > > > > > > source > > > > > > > QueueMetrics,q=default registered. > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for > > > > > > > > source > > > > > ugi > > > > > > > registered. > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO > > > > > delegation.AbstractDelegationTokenSecretManager: > > > > > > > > Updating the current master key for generating delegation tokens > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO > > > > > delegation.AbstractDelegationTokenSecretManager: > > > > > > > Starting expired delegation token remover thread, > > > > > > > tokenRemoverScanInterval=60 min(s) > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured > > > > > > > > with > > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, > > > > > limitMaxMemForMapTasks, > > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1) > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO > > > > > delegation.AbstractDelegationTokenSecretManager: > > > > > > > > Updating the current master key for generating delegation tokens > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts > > > > > > > (include/exclude) list > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker > > > > > > > > with > > > > > > owner > > > > > > > as root > > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for > > > > > > > > source > > > > > > > RpcDetailedActivityForPort9001 registered. > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for > > > > > > > > source > > > > > > > RpcActivityForPort9001 registered. > > > > > > > > > > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to > > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > > > > > > > org.mortbay.log.Slf4jLog > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global > > > > > > > > filtersafety > > > > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by > > > > > > > > webServer.getConnectors()[0].getLocalPort() before open() is -1. > > > > > Opening > > > > > > > the listener on 50030 > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort() > > > > > > returned > > > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030 > > > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port > > > > > > > > 50030 > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26 > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started > > > > > > > > [email protected]:50030 > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for > > > > > > > > source > > > > > jvm > > > > > > > registered. > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for > > > > > > > > source > > > > > > > JobTrackerMetrics registered. > > > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001 > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver: > > > > 50030 > > > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system > > > > > > > directory > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being > > > > > > > initialized in embedded mode > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job > > > > > > > > history > > > > > > > server at: localhost:50030 > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web > > > > > > > address: localhost:50030 > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed > > > > job > > > > > > > store is inactive > > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting > > > > MesosScheduler > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts > > > > > information > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from > > > > > > > > scheduler > > > > > > > driver: Cannot parse '@ > > > > > > > > 0.0.0.0:0' > > > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the > > > > > > > > includes > > > > > file > > > > > > to > > > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the > > > > > > > > excludes > > > > > file > > > > > > to > > > > > > > > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts > > > > > > > (include/exclude) list > > > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 > > > > > > > > nodes > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: > > > > > > > > starting > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001: > > > > > > starting > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001: > > > > > > starting > > > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001: > > > > > > starting > > > > > > > > > > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load > > > > > > > > > > > > > native-hadoop library for your platform... using builtin-java > > > > > > > classes > > > > > > where > > > > > > > applicable > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: > > > > > > > > job_201304231321_0001: > > > > > > > nMaps=0 nReduces=0 max=-1 > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job > > > > > > > job_201304231321_0001 > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job > > > > > > > > job_201304231321_0001 > > > > > > > added successfully for user 'root' to queue 'default' > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root > > > > > IP=192.168.0.2 > > > > > > > OPERATION=SUBMIT_JOB TARGET=job_201304231321_0001 > > > > RESULT=SUCCESS > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing > > > > > > > job_201304231321_0001 > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing > > > > > > > job_201304231321_0001 > > > > > > > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated > > > > > > > > and > > > > > > > stored with users keys in > > > > > > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken > > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job > > > > > > > job_201304231321_0001 = 0. Number of splits = 0 > > > > > > > > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job > > > > > job_201304231321_0001 > > > > > > > initialized successfully with 0 map tasks and 0 reduce tasks. > > > > > > > > > > > > > > > > ------------------------------ > > > > > > > > Wang Yu > > > > > > > > > > > > > > > > *From:* 王国栋 <[email protected]> > > > > > > > > *Date:* 2013-04-23 11:34 > > > > > > > > *To:* mesos-dev <[email protected]>; wangyu< > > > > > > > [email protected]> > > > > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: > > > > > > > > Unknown/exited TaskTracker: http://slave5:50060 > > > > > > > > Hi Yu, > > > > > > > > > > > > > > > > > > > Mesos will just launch tasktracker on each slave node as long as > > > > the > > > > > > > > > > > > > > required resource is enough for the tasktracker. So you have to > > > > > > > > run > > > > > > > > NameNode, Jobtracker and DataNode by your own. > > > > > > > > > > > > > > > > Basicly, starting the hadoop on mesos is like this. > > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should > > > > configure > > > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the > > > > > normal > > > > > > > one. > > > > > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you > > > > > > > > should > > > > > > > > > configure mapred-site.xml, this jobtracker should contains the > > > > patch > > > > > > for > > > > > > > > mesos) > > > > > > > > > > > > > > > > > > > Then, you can use mesos web UI and jobtracker web UI to check > > > > > > > > the > > > > > > status > > > > > > > > of Jobtracker. > > > > > > > > > > > > > > > > Guodong > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's > > > > > > > >> my > > > > > > > >> problem. Thanks very much! > > > > > > > >> > > > > > > > > > > > > > >> ps: Besides TaskTracker, is there any other roles(like > > > > > > > >> JobTracker, > > > > > > > >> DataNode) I should stop it first? > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> Wang Yu > > > > > > > >> > > > > > > > >> 发件人: Benjamin Mahler > > > > > > > >> 发送时间: 2013-04-23 10:56 > > > > > > > >> 收件人: [email protected]; wangyu > > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: > > > > Unknown/exited > > > > > > > >> TaskTracker: http://slave5:50060 > > > > > > > >> The scheduler we wrote for Hadoop will start its own > > > > TaskTrackers, > > > > > > > >> meaning > > > > > > > >> you do not have to start any TaskTrackers yourself > > > > > > > >> > > > > > > > > > > > > > >> Are you starting your own TaskTrackers? Are there any > > > > > > > >> TaskTrackers > > > > > > > running > > > > > > > >> in your cluster? > > > > > > > >> > > > > > > > >> Looking at your jps output, is there already a TaskTracker > > > > running? > > > > > > > >> [root@master logs]# jps > > > > > > > >> 13896 RunJar > > > > > > > >> 14123 Jps > > > > > > > >> 12718 NameNode > > > > > > > >> 12900 DataNode > > > > > > > >> 13374 TaskTracker <--- How was this started? > > > > > > > >> 13218 JobTracker > > > > > > > >> > > > > > > > >> > > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <[email protected] > > > > > > wrote: > > > > > > > >> > > > > > > > >> > Hi, Ben and Guodong, > > > > > > > >> > > > > > > > > > > > > > > >> > What do you mean "managing your own TaskTrackers"? How > > > > > > > >> > should I > > > > > know > > > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not > > > > > familiar > > > > > > > >> with > > > > > > > >> > mesos very much. > > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and > > > > > core-site.xml > > > > > > > in > > > > > > > > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I just want > > > > > > > >> > to > > > > > set > > > > > > up > > > > > > > >> > hadoop on mesos, and run my MR tasks. > > > > > > > >> > > > > > > > > > > > >> > Thanks very much for your patient reply...Maybe I have a long > > > > way > > > > > to > > > > > > > >> go... > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > The log messages you see: > > > > > > > >> > 2013-04-18 16:47:19,645 INFO > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: http://master:50060. > > > > > > > >> > > > > > > > > > > > >> > Are printed when mesos does not know about the TaskTracker. > > > > > > > >> > We > > > > > > > currently > > > > > > > >> > don't support running your own TaskTrackers, as the > > > > MesosScheduler > > > > > > > will > > > > > > > >> > launch them on your behalf when needed. > > > > > > > >> > > > > > > > > > >> > Are you managing your own TaskTrackers? The purpose of using > > > > > Hadoop > > > > > > > with > > > > > > > > > > > > > >> > mesos is that you no longer have to do that. We will detect > > > > > > > >> > that > > > > > > jobs > > > > > > > >> have > > > > > > > > > > > > > >> > pending map / reduce tasks and launch TaskTrackers > > > > > > > >> > accordingly. > > > > > > > >> > > > > > > > > >> > Guodong may be able to help further getting set up! > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > Wang Yu > > > > > > > >> > > > > > > > > >> > From: 王国栋 > > > > > > > >> > Date: 2013-04-18 17:10 > > > > > > > >> > To: mesos-dev; wangyu > > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler: > > > > > Unknown/exited > > > > > > > >> > TaskTracker: http://slave5:50060 > > > > > > > > > > > > > >> > You can check the slave log and the mesos-executor log, > > > > > > > >> > which is > > > > > > > >> normally > > > > > > > >> > located in the dir like > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr". > > > > > > > >> > The log is tasktracker log. > > > > > > > >> > > > > > > > > >> > I hope it will help. > > > > > > > >> > > > > > > > > >> > Guodong > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 < > [email protected] > > > > > > > > wrote: > > > > > > > >> > > > > > > > > >> > > ** > > > > > > > >> > > Hi All, > > > > > > > >> > > > > > > > > > > > > >> > > I have deployed mesos on three node: master, slave1, > > > > > > > >> > > slave5. > > > > and > > > > > > it > > > > > > > >> works > > > > > > > >> > > well. > > > > > > > > >> > > Then I set hadoop over it, using master as namenode, and > > > > > master, > > > > > > > >> slave1, > > > > > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks works > > > > > > > >> > > well. > > > > > > > >> > > [root@master logs]# jps > > > > > > > >> > > 13896 RunJar > > > > > > > >> > > 14123 Jps > > > > > > > >> > > 12718 NameNode > > > > > > > >> > > 12900 DataNode > > > > > > > >> > > 13374 TaskTracker > > > > > > > >> > > 13218 JobTracker > > > > > > > >> > > > > > > > > > >> > > Then I run test benchmark, it can not go on working... > > > > > > > >> > > [root@master > > > > > > > >> > > hadoop-0.20.205.0]# bin/hadoop jar > > > > > hadoop-examples-0.20.205.0.jar > > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886 > > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand > > > > > > > >> > > Running 30 maps. > > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013 > > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job: > > > > > > > >> > job_201304181646_0001 > > > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient: map 0% reduce 0% > > > > > > > >> > > It stopped here. > > > > > > > >> > > > > > > > > > > > > >> > > Then I read the log file: > > > > > > > >> > > hadoop-root-jobtracker-master.log, > > > > it > > > > > > > shows: > > > > > > > >> > > 2013-04-18 16 > > > > > > > > > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: > > > > > > > >> > > Starting > > > > > > > RUNNING > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server > > > > > handler 5 > > > > > > > on > > > > > > > >> > 9001: starting > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server > > > > > handler 6 > > > > > > > on > > > > > > > >> > 9001: starting > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server > > > > > handler 9 > > > > > > > on > > > > > > > >> > 9001: starting > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server > > > > > handler 7 > > > > > > > on > > > > > > > >> > 9001: starting > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server > > > > > handler 8 > > > > > > > on > > > > > > > >> > 9001: starting > > > > > > > >> > > 2013-04-18 16 > > > > > > > > > > > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: > > > > > > > >> > > Adding > > > > a > > > > > > new > > > > > > > >> > node: /default-rack/master > > > > > > > >> > > 2013-04-18 16 > > > > > > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding > > > > > > tracker > > > > > > > >> > tracker_master:localhost/ > > > > > > > >> > > 127.0.0.1:44997 to host master > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> Unknown/exited > > > > > > > >> > TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> Unknown/exited > > > > > > > >> > TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> Unknown/exited > > > > > > > >> > TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > 2013-04-18 16 > > > > > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> Unknown/exited > > > > > > > >> > TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:04,609 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:07,618 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:10,625 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:13,632 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO > > > > > > org.apache.hadoop.net.NetworkTopology: > > > > > > > >> > Adding a new node: /default-rack/slave5 > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO > > > > > org.apache.hadoop.mapred.JobTracker: > > > > > > > >> Adding > > > > > > > >> > tracker tracker_slave5: > > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5 > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:13,687 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://slave5:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:16,638 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:16,697 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://slave5:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:19,645 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:19,707 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://slave5:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:22,651 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:22,715 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://slave5:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:25,658 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:25,725 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://slave5:50060. > > > > > > > >> > > > > > > > > > >> > > 2013-04-18 16:47:28,665 INFO > > > > > > > org.apache.hadoop.mapred.MesosScheduler: > > > > > > > >> > Unknown/exited TaskTracker: > > > > > > > >> > > http://master:50060. > > > > > > > >> > > > > > > > > > >> > > Does anybody can help me? Thanks very much! > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
