Hi Aaron, I am on Hadoop 0.20.0 on Ubuntu, pseudo distributed mode. If I remove the sleep time from my start-all.sh script, my jobtracker comes up momentarily and then dies.
Here is a capture of my commands: sgo...@desktop:~/software/hadoop-0.20.0$ bin/hadoop namenode -format 10/02/13 21:54:19 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = desktop/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009 ************************************************************/ 10/02/13 21:54:19 DEBUG conf.Configuration: java.io.IOException: config() at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:210) at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:197) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:937) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:964) Re-format filesystem in /tmp/hadoop-sgoyal/dfs/name ? (Y or N) Y 10/02/13 21:54:22 DEBUG security.UserGroupInformation: Unix Login: sgoyal,sgoyal,adm,dialout,cdrom,audio,plugdev,fuse,lpadmin,admin,sambashare,mysql,cvsgroup 10/02/13 21:54:22 INFO namenode.FSNamesystem: fsOwner=sgoyal,sgoyal,adm,dialout,cdrom,audio,plugdev,fuse,lpadmin,admin,sambashare,mysql,cvsgroup 10/02/13 21:54:22 INFO namenode.FSNamesystem: supergroup=supergroup 10/02/13 21:54:22 INFO namenode.FSNamesystem: isPermissionEnabled=true 10/02/13 21:54:22 INFO common.Storage: Image file of size 96 saved in 0 seconds. 10/02/13 21:54:22 DEBUG namenode.FSNamesystem: Preallocating Edit log, current size 0 10/02/13 21:54:22 DEBUG namenode.FSNamesystem: Edit log size is now 1049088 written 512 bytes at offset 1048576 10/02/13 21:54:22 INFO common.Storage: Storage directory /tmp/hadoop-sgoyal/dfs/name has been successfully formatted. 10/02/13 21:54:22 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at desktop/127.0.1.1 ************************************************************/ sgo...@desktop:~/software/hadoop-0.20.0$ bin/start-all.sh starting namenode, logging to /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-namenode-desktop.out localhost: starting datanode, logging to /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-datanode-desktop.out localhost: starting secondarynamenode, logging to /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-secondarynamenode-desktop.out starting jobtracker, logging to /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-jobtracker-desktop.out localhost: starting tasktracker, logging to /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-tasktracker-desktop.out sgo...@desktop:~/software/hadoop-0.20.0$ jps 26171 Jps 26037 JobTracker 25966 SecondaryNameNode 25778 NameNode 26130 TaskTracker 25863 DataNode sgo...@desktop:~/software/hadoop-0.20.0$ jps 26037 JobTracker 25966 SecondaryNameNode 26203 Jps 25778 NameNode 26130 TaskTracker 25863 -- process information unavailable sgo...@desktop:~/software/hadoop-0.20.0$ jps 26239 Jps 26037 JobTracker 25966 SecondaryNameNode 25778 NameNode 26130 TaskTracker sgo...@desktop:~/software/hadoop-0.20.0$ jps 26037 JobTracker 25966 SecondaryNameNode 25778 NameNode 26130 TaskTracker 26252 Jps sgo...@desktop:~/software/hadoop-0.20.0$ jps 26288 Jps 25966 SecondaryNameNode 25778 NameNode sgo...@desktop:~/software/hadoop-0.20.0$ jps 25966 SecondaryNameNode 25778 NameNode 26298 Jps sgo...@desktop:~/software/hadoop-0.20.0$ jps 26308 Jps 25966 SecondaryNameNode 25778 NameNode My jobtracker logs show: 2010-02-13 21:54:40,660 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = desktop/127.0.1.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504; compiled by 'ndaley' on Thu Apr 9 05:18:40 UTC 2009 ************************************************************/ 2010-02-13 21:54:40,967 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=JobTracker, port=9001 2010-02-13 21:54:52,100 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2010-02-13 21:54:52,358 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50030 2010-02-13 21:54:52,359 INFO org.mortbay.log: jetty-6.1.14 2010-02-13 21:55:13,222 INFO org.mortbay.log: Started [email protected]:50030 2010-02-13 21:55:13,227 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 2010-02-13 21:55:13,229 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9001 2010-02-13 21:55:13,229 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030 2010-02-13 21:55:13,942 INFO org.apache.hadoop.mapred.JobTracker: Cleaning up the system directory 2010-02-13 21:55:14,049 INFO org.apache.hadoop.hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-sgoyal/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:739) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy4.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2873) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2755) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232) 2010-02-13 21:55:14,049 WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping /tmp/hadoop-sgoyal/mapred/system/ jobtracker.info retries left 4 2010-02-13 21:55:14,459 INFO org.apache.hadoop.hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-sgoyal/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 I suspected the dfs was not ready, and the sleep seems to solve this issue. Look forward to hearing your take on this. Please feel free to let me know if you need any other info. Thanks and Regards, Sonal On Sat, Feb 13, 2010 at 6:40 AM, Aaron Kimball <[email protected]> wrote: > Sonal, > > Can I ask why you're sleeping between starting hdfs and mapreduce? I've > never needed this in my own code. In general, Hadoop is pretty tolerant > about starting daemons "out of order." > > If you need to wait for HDFS to be ready and come out of safe mode before > launching a job, that's another story, but you can accomplish that with: > > $HADOOP_HOME/hadoop dfsadmin -safemode wait > > ... which will block until HDFS is ready for user commands in read/write > mode. > - Aaron > > > On Fri, Feb 12, 2010 at 8:44 AM, Sonal Goyal <[email protected]> > wrote: > > > Hi > > > > I had faced a similar issue on Ubuntu and Hadoop 0.20 and modified the > > start-all script to introduce a sleep time : > > > > bin=`dirname "$0"` > > bin=`cd "$bin"; pwd` > > > > . "$bin"/hadoop-config.sh > > > > # start dfs daemons > > "$bin"/start-dfs.sh --config $HADOOP_CONF_DIR > > *echo 'sleeping' > > sleep 60 > > echo 'awake'* > > # start mapred daemons > > "$bin"/start-mapred.sh --config $HADOOP_CONF_DIR > > > > > > This seems to work. Please see if this works for you. > > Thanks and Regards, > > Sonal > > > > > > On Thu, Feb 11, 2010 at 3:56 AM, E. Sammer <[email protected]> wrote: > > > > > On 2/10/10 5:19 PM, Nick Klosterman wrote: > > > > > >> @E.Sammer, no I don't *think* that it is part of another cluster. The > > >> tutorial is for a single node cluster just as a initial set up to see > if > > >> you can get things up and running. I have reformatted the namenode > > >> several times in my effort to get hadoop to work. > > >> > > > > > > What I mean is that the data node, at some point, connected to your > name > > > node. If you reformat the name node, the data node must be wiped clean; > > it's > > > effectively trying to join a name node that no longer exists. > > > > > > > > > -- > > > Eric Sammer > > > [email protected] > > > http://esammer.blogspot.com > > > > > >
