Hello Everyone: I am adding the contents of my config file in the hopes that someone will be able to help. See inline for the discussions. I really don't understand why it works in pseudo-mode but gives so much problems in cluster. I have tried the instructions from the Apache cluster setup, Yahoo Development Network and from Michael Noll's tutorial.
w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://ngs.uni.ac.uk:3000</value> </property> <property> <name>HADOOP_LOG_DIR</name> <value>/home/w1153435/hadoop-0.20.2_cluster/var/log/hadoop</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop</value> </property> </configuration> w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.http.address</name> <value>0.0.0.0:3500</value> </property> <property> <name>dfs.data.dir</name> <value>/home/w1153435/hadoop-0.20.2_cluster/dfs/data</value> <final>true</final> </property> <property> <name>dfs.name.dir</name> <value>/home/w1153435/hadoop-0.20.2_cluster/dfs/name</value> <final>true</final> </property> </configuration> w1153435@ngs:~/hadoop-0.20.2_cluster/conf> cat mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>ngs.uni.ac.uk:3001</value> </property> <property> <name>mapred.system.dir</name> <value>/home/w1153435/hadoop-0.20.2_cluster/mapred/system</value> </property> <property> <name>mapred.map.tasks</name> <value>80</value> </property> <property> <name>mapred.reduce.tasks</name> <value>16</value> </property> </configuration> In addition: w1153435@ngs:~/hadoop-0.20.2_cluster> bin/hadoop dfsadmin -report Configured Capacity: 0 (0 KB) Present Capacity: 0 (0 KB) DFS Remaining: 0 (0 KB) DFS Used: 0 (0 KB) DFS Used%: �% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 1 (1 total, 0 dead) Name: 161.74.12.36:50010 Decommission Status : Normal Configured Capacity: 0 (0 KB) DFS Used: 0 (0 KB) Non DFS Used: 0 (0 KB) DFS Remaining: 0(0 KB) DFS Used%: 100% DFS Remaining%: 0% Last contact: Wed Aug 17 12:40:17 BST 2011 Cheers, A Df >________________________________ >From: A Df <[email protected]> >To: "[email protected]" <[email protected]>; >"[email protected]" <[email protected]> >Sent: Tuesday, 16 August 2011, 16:20 >Subject: Re: hadoop cluster mode not starting up > > > >See inline: > > >>________________________________ >>From: shanmuganathan.r <[email protected]> >>To: [email protected] >>Sent: Tuesday, 16 August 2011, 13:35 >>Subject: Re: hadoop cluster mode not starting up >> >>Hi Df, >> >> Are you use the IP instead of names in conf/masters and conf/slaves . >>For running the secondary namenode in separate machine refer the following >>link >> >> >>=Yes, I use the names in those files but the ip address are mapped to the >>names in the /extras/hosts file. Does this cause problems? >> >> >>http://www.hadoop-blog.com/2010/12/secondarynamenode-process-is-starting.html >> >> >>=I want to making too many changes so I will stick to having the master be >>both namenode and secondarynamenode. I tried starting up the hdfs and >>mapreduce but the jobtracker is not running on the master and their is still >>errors regarding the datanodes because only 5 of 7 datanodes have >>tasktracker. I ran both commands for to start the hdfs and mapreduce so why >>is the jobtracker missing? >> >>Regards, >> >>Shanmuganathan >> >> >> >>---- On Tue, 16 Aug 2011 17:06:04 +0530 A >>Df<[email protected]> wrote ---- >> >> >>I already used a few tutorials as follows: >> * Hadoop Tutorial on Yahoo Developer network which uses an old hadoop and >>thus older conf files. >> >> * >>http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ >> which only has two nodes and the master acts as namenode and secondary >>namenode. I need one with more than that. >> >> >>Is there a way to prevent the node from using the central file system because >>I don't have root permission and my user folder is in a central file system >>which is replicated on all the nodes? >> >>See inline too for my responses >> >> >> >>>________________________________ >>>From: Steve Loughran <[email protected]> >>>To: [email protected] >>>Sent: Tuesday, 16 August 2011, 12:08 >>>Subject: Re: hadoop cluster mode not starting up >>> >>>On 16/08/11 11:19, A Df wrote: >>>> See inline >>>> >>>> >>>> >>>>> ________________________________ >>>>> From: Steve Loughran<[email protected]> >>>>> To: [email protected] >>>>> Sent: Tuesday, 16 August 2011, 11:08 >>>>> Subject: Re: hadoop cluster mode not starting up >>>>> >>>>> On 16/08/11 11:02, A Df wrote: >>>>>> Hello All: >>>>>> >>>>>> I used a combination of tutorials to setup hadoop but most >>seems to be using either an old version of hadoop or only using 2 machines >>for the cluster which isn't really a cluster. Does anyone know of a good >>tutorial which setups multiple nodes for a cluster?? I already looked at the >>Apache website but it does not give sample values for the conf files. Also >>each set of tutorials seem to have a different set of parameters which they >>indicate should be changed so now its a bit confusing. For example, my >>configuration sets a dedicate namenode, secondary namenode and 8 slave nodes >>but when I run the start command it gives an error. Should I install hadoop >>to my user directory or on the root? I have it in my directory but all the >>nodes have a central file system as opposed to distributed so whatever I do >>on one node in my user folder it affect all the others so how do i set the >>paths to ensure that it uses a distributed system? >>>>>> >>>>>> For the errors below, I checked the directories and the >>files are there. Am I not sure what went wrong and how to set the conf to not >>have central file system. Thank you. >>>>>> >>>>>> Error message >>>>>> CODE >>>>>> w1153435@n51:~/hadoop-0.20.2_cluster> bin/start-dfs.sh >>>>>> bin/start-dfs.sh: line 28: >>/w1153435/hadoop-0.20.2_cluster/bin/hadoop-config.sh: No such file or >>directory >>>>>> bin/start-dfs.sh: line 50: >>/w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemon.sh: No such file or >>directory >>>>>> bin/start-dfs.sh: line 51: >>/w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or >>directory >>>>>> bin/start-dfs.sh: line 52: >>/w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh: No such file or >>directory >>>>>> CODE >>>>> >>>>> there's No such file or directory as >>>>> /w1153435/hadoop-0.20.2_cluster/bin/hadoop-daemons.sh >>>>> >>>>> >>>>> There is, I checked as shown >>>>> w1153435@n51:~/hadoop-0.20.2_cluster> ls bin >>>>> hadoop rcc start-dfs.sh >>stop-dfs.sh >>>>> hadoop-config.sh slaves.sh start-mapred.sh >>stop-mapred.sh >>>>> hadoop-daemon.sh start-all.sh stop-all.sh >>>>> hadoop-daemons.sh start-balancer.sh stop-balancer.sh >>> >>>try "pwd" to print out where the OS thinks you are, as it doesn't seem >>>to be where you think you are >>> >>> >>>w1153435@ngs:~/hadoop-0.20.2_cluster> pwd >>>/home/w1153435/hadoop-0.20.2_cluster >>> >>> >>>w1153435@ngs:~/hadoop-0.20.2_cluster/bin> pwd >>>/home/w1153435/hadoop-0.20.2_cluster/bin >>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> I had tried running this command below earlier but also got >>problems: >>>>>> CODE >>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> export >>HADOOP_CONF_DIR=${HADOOP_HOME}/conf >>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> export >>HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves >>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> >>${HADOOP_HOME}/bin/slaves.sh "mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >>>>>> -bash: /bin/slaves.sh: No such file or directory >>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> export >>HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster >>>>>> w1153435@ngs:~/hadoop-0.20.2_cluster> >>${HADOOP_HOME}/bin/slaves.sh "mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >>>>>> cat: /conf/slaves: No such file or directory >>>>>> CODE >>>>>> >>>>> there's No such file or directory as /conf/slaves because you >>set >>>>> HADOOP_HOME after setting the other env variables, which are >>expanded at >>>>> set-time, not run-time. >>>>> >>>>> I redid the command but still have errors on the slaves >>>>> >>>>> >>>>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>HADOOP_HOME=/home/w1153435/hadoop-0.20.2_cluster >>>>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>HADOOP_CONF_DIR=${HADOOP_HOME}/conf >>>>> w1153435@n51:~/hadoop-0.20.2_cluster> export >>HADOOP_SLAVES=${HADOOP_CONF_DIR}/slaves >>>>> w1153435@n51:~/hadoop-0.20.2_cluster> >>${HADOOP_HOME}/bin/slaves.sh "mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop" >>>>> privn51: bash: mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory >>>>> privn58: bash: mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory >>>>> privn52: bash: mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory >>>>> privn55: bash: mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory >>>>> privn57: bash: mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory >>>>> privn54: bash: mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory >>>>> privn53: bash: mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory >>>>> privn56: bash: mkdir -p >>/home/w1153435/hadoop-0.20.2_cluster/tmp/hadoop: No such file or directory >>> >>>try ssh-ing in, do it by hand, make sure you have the right permissions >>etc >>> >>> >>>I reset the above path variables again and checked that they existed and >>tried the command above but same error. I used ssh with no problems and no >>password request so that is fine. What else could be wrong? >>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_HOME >> /home/w1153435/hadoop-0.20.2_cluster >>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_CONF_DIR >> /home/w1153435/hadoop-0.20.2_cluster/conf >>>w1153435@ngs:~/hadoop-0.20.2_cluster> echo $HADOOP_SLAVES >> /home/w1153435/hadoop-0.20.2_cluster/conf/slaves >>>w1153435@ngs:~/hadoop-0.20.2_cluster> >>> >>> >>> >>> >>> >> >> >> >> > >
