Hi Sonal,

Why are you using Hadoop 0.20.0? It's fairly old and there are lots of
fixes in 0.20.1, and more in 0.20.2 which should be released any
minute now.

In particular, you're missing this change:
https://issues.apache.org/jira/browse/HADOOP-5921

which makes the JobTracker stubbornly wait for DFS to appear.

I'd recommend using either (a) Apache 0.20.1, (b) Owen's rc of 0.20.2,
or (c) Cloudera's 0.20.1 based build at
http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.56.tar.gz which is
0.20.1 plus 225 extra patches (incl most of what's in 0.20.2).

-Todd

On Sat, Feb 13, 2010 at 8:35 AM, Sonal Goyal <[email protected]> wrote:
> Hi Aaron,
>
> I am on Hadoop 0.20.0 on Ubuntu, pseudo distributed mode. If I remove the
> sleep time from my start-all.sh script, my jobtracker comes up momentarily
> and then dies.
>
> Here is a capture of my commands:
>
> sgo...@desktop:~/software/hadoop-0.20.0$ bin/hadoop namenode -format
> 10/02/13 21:54:19 INFO namenode.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = desktop/127.0.1.1
> STARTUP_MSG:   args = [-format]
> STARTUP_MSG:   version = 0.20.0
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504;
> compiled by 'ndaley' on Thu Apr  9 05:18:40 UTC 2009
> ************************************************************/
> 10/02/13 21:54:19 DEBUG conf.Configuration: java.io.IOException: config()
>    at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:210)
>    at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:197)
>    at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:937)
>    at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:964)
>
> Re-format filesystem in /tmp/hadoop-sgoyal/dfs/name ? (Y or N) Y
> 10/02/13 21:54:22 DEBUG security.UserGroupInformation: Unix Login:
> sgoyal,sgoyal,adm,dialout,cdrom,audio,plugdev,fuse,lpadmin,admin,sambashare,mysql,cvsgroup
> 10/02/13 21:54:22 INFO namenode.FSNamesystem:
> fsOwner=sgoyal,sgoyal,adm,dialout,cdrom,audio,plugdev,fuse,lpadmin,admin,sambashare,mysql,cvsgroup
> 10/02/13 21:54:22 INFO namenode.FSNamesystem: supergroup=supergroup
> 10/02/13 21:54:22 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 10/02/13 21:54:22 INFO common.Storage: Image file of size 96 saved in 0
> seconds.
> 10/02/13 21:54:22 DEBUG namenode.FSNamesystem: Preallocating Edit log,
> current size 0
> 10/02/13 21:54:22 DEBUG namenode.FSNamesystem: Edit log size is now 1049088
> written 512 bytes  at offset 1048576
> 10/02/13 21:54:22 INFO common.Storage: Storage directory
> /tmp/hadoop-sgoyal/dfs/name has been successfully formatted.
> 10/02/13 21:54:22 INFO namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at desktop/127.0.1.1
> ************************************************************/
>
>
> sgo...@desktop:~/software/hadoop-0.20.0$ bin/start-all.sh
> starting namenode, logging to
> /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-namenode-desktop.out
> localhost: starting datanode, logging to
> /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-datanode-desktop.out
> localhost: starting secondarynamenode, logging to
> /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-secondarynamenode-desktop.out
> starting jobtracker, logging to
> /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-jobtracker-desktop.out
> localhost: starting tasktracker, logging to
> /home/sgoyal/software/hadoop-0.20.0/bin/../logs/hadoop-sgoyal-tasktracker-desktop.out
>
> sgo...@desktop:~/software/hadoop-0.20.0$ jps
> 26171 Jps
> 26037 JobTracker
> 25966 SecondaryNameNode
> 25778 NameNode
> 26130 TaskTracker
> 25863 DataNode
>
> sgo...@desktop:~/software/hadoop-0.20.0$ jps
> 26037 JobTracker
> 25966 SecondaryNameNode
> 26203 Jps
> 25778 NameNode
> 26130 TaskTracker
> 25863 -- process information unavailable
>
> sgo...@desktop:~/software/hadoop-0.20.0$ jps
> 26239 Jps
> 26037 JobTracker
> 25966 SecondaryNameNode
> 25778 NameNode
> 26130 TaskTracker
>
> sgo...@desktop:~/software/hadoop-0.20.0$ jps
> 26037 JobTracker
> 25966 SecondaryNameNode
> 25778 NameNode
> 26130 TaskTracker
> 26252 Jps
>
> sgo...@desktop:~/software/hadoop-0.20.0$ jps
> 26288 Jps
> 25966 SecondaryNameNode
> 25778 NameNode
>
> sgo...@desktop:~/software/hadoop-0.20.0$ jps
> 25966 SecondaryNameNode
> 25778 NameNode
> 26298 Jps
>
> sgo...@desktop:~/software/hadoop-0.20.0$ jps
> 26308 Jps
> 25966 SecondaryNameNode
> 25778 NameNode
>
> My jobtracker logs show:
>
> 2010-02-13 21:54:40,660 INFO org.apache.hadoop.mapred.JobTracker:
> STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting JobTracker
> STARTUP_MSG:   host = desktop/127.0.1.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.0
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 -r 763504;
> compiled by 'ndaley' on Thu Apr  9 05:18:40 UTC 2009
> ************************************************************/
> 2010-02-13 21:54:40,967 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=JobTracker, port=9001
> 2010-02-13 21:54:52,100 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2010-02-13 21:54:52,358 INFO org.apache.hadoop.http.HttpServer: Jetty bound
> to port 50030
> 2010-02-13 21:54:52,359 INFO org.mortbay.log: jetty-6.1.14
> 2010-02-13 21:55:13,222 INFO org.mortbay.log: Started
> [email protected]:50030
> 2010-02-13 21:55:13,227 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=JobTracker, sessionId=
> 2010-02-13 21:55:13,229 INFO org.apache.hadoop.mapred.JobTracker: JobTracker
> up at: 9001
> 2010-02-13 21:55:13,229 INFO org.apache.hadoop.mapred.JobTracker: JobTracker
> webserver: 50030
> 2010-02-13 21:55:13,942 INFO org.apache.hadoop.mapred.JobTracker: Cleaning
> up the system directory
> 2010-02-13 21:55:14,049 INFO org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /tmp/hadoop-sgoyal/mapred/system/jobtracker.info could only be replicated to
> 0 nodes, instead of 1
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1256)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:739)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy4.addBlock(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy4.addBlock(Unknown Source)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2873)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2755)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)
>
> 2010-02-13 21:55:14,049 WARN org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping /tmp/hadoop-sgoyal/mapred/system/
> jobtracker.info retries left 4
> 2010-02-13 21:55:14,459 INFO org.apache.hadoop.hdfs.DFSClient:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /tmp/hadoop-sgoyal/mapred/system/jobtracker.info could only be replicated to
> 0 nodes, instead of 1
>
>
> I suspected the dfs was not ready, and the sleep seems to solve this issue.
> Look forward to hearing your take on this. Please feel free to let me know
> if you need any other info.
>
> Thanks and Regards,
> Sonal
>
>
> On Sat, Feb 13, 2010 at 6:40 AM, Aaron Kimball <[email protected]> wrote:
>
>> Sonal,
>>
>> Can I ask why you're sleeping between starting hdfs and mapreduce? I've
>> never needed this in my own code. In general, Hadoop is pretty tolerant
>> about starting daemons "out of order."
>>
>> If you need to wait for HDFS to be ready and come out of safe mode before
>> launching a job, that's another story, but you can accomplish that with:
>>
>> $HADOOP_HOME/hadoop dfsadmin -safemode wait
>>
>> ... which will block until HDFS is ready for user commands in read/write
>> mode.
>> - Aaron
>>
>>
>> On Fri, Feb 12, 2010 at 8:44 AM, Sonal Goyal <[email protected]>
>> wrote:
>>
>> > Hi
>> >
>> > I had faced a similar issue on Ubuntu and Hadoop 0.20 and modified the
>> > start-all script to introduce a sleep time :
>> >
>> > bin=`dirname "$0"`
>> > bin=`cd "$bin"; pwd`
>> >
>> > . "$bin"/hadoop-config.sh
>> >
>> > # start dfs daemons
>> > "$bin"/start-dfs.sh --config $HADOOP_CONF_DIR
>> > *echo 'sleeping'
>> > sleep 60
>> > echo 'awake'*
>> > # start mapred daemons
>> > "$bin"/start-mapred.sh --config $HADOOP_CONF_DIR
>> >
>> >
>> > This seems to work. Please see if this works for you.
>> > Thanks and Regards,
>> > Sonal
>> >
>> >
>> > On Thu, Feb 11, 2010 at 3:56 AM, E. Sammer <[email protected]> wrote:
>> >
>> > > On 2/10/10 5:19 PM, Nick Klosterman wrote:
>> > >
>> > >> @E.Sammer, no I don't *think* that it is part of another cluster. The
>> > >> tutorial is for a single node cluster just as a initial set up to see
>> if
>> > >> you can get things up and running. I have reformatted the namenode
>> > >> several times in my effort to get hadoop to work.
>> > >>
>> > >
>> > > What I mean is that the data node, at some point, connected to your
>> name
>> > > node. If you reformat the name node, the data node must be wiped clean;
>> > it's
>> > > effectively trying to join a name node that no longer exists.
>> > >
>> > >
>> > > --
>> > > Eric Sammer
>> > > [email protected]
>> > > http://esammer.blogspot.com
>> > >
>> >
>>
>

Reply via email to