Re: Configuration and Hadoop cluster setup

Phantom Fri, 25 May 2007 14:19:42 -0700

At last I managed to get this working along the lines of what I would want
it to do. I had to modify the sample to set the property explicitly. I did
jobConf.set("mapred.job.tracker", "<host:port>").


If my Map job is going to process a file does it have to be in HDFS and if
so how do I get it there ? Any resource I can read to get a better
understanding.

Thanks
Avinash

On 5/25/07, Phantom <[EMAIL PROTECTED] > wrote:


Here is a copy of my hadoop-site.xml. What am I doing wrong ?

<configuration>
        <property>
                <name>fs.default.name</name>
                <value> dev030.sctm.com:9000</value>
        </property>

        <property>
                <name> dfs.name.dir</name>
                <value>/tmp/hadoop</value>
        </property>

        <property>
                <name>mapred.job.tracker</name>
                <value> dev030.sctm.com:50029 </value>
        </property>

        <property>
                <name>mapred.job.tracker.info.port</name>
                <value>50030</value>
        </property>

        <property>
                <name>mapred.min.split.size</name>
                <value>65536</value>
        </property>

        <property>
                <name> dfs.replication</name>
                <value>1</value>
        </property>

</configuration>


On 5/25/07, Vishal Shah <[EMAIL PROTECTED]> wrote:
>
> Hi Avinash,
>
>   Can you share your hadoop-site.xml, mapred-default.xml and slaves
> files?
> Most probably, you have not set the jobtracker properly in the
> hadoop-site.xml conf file. Check the property mapred.job.trackerproperty in
> your file. It should look something like this:
>
> <property>
>   <name>mapred.job.tracker</name>
>   <value>fully.qualified.domainname:40000</value>
>   <description>The host and port that the MapReduce job tracker runs
>   at.  If "local", then jobs are run in-process as a single map
>   and reduce task.
>   </description>
> </property>
>
> -vishal.
>
> -----Original Message-----
> From: Mahadev Konar [mailto: [EMAIL PROTECTED]
> Sent: Friday, May 25, 2007 5:54 AM
> To: [email protected]
> Subject: RE: Configuration and Hadoop cluster setup
>
> Hi,
>   When you run the job, you need to set the environment variable
> HADOOP_CONF_DIR to the configuration directory that has the
> configuration
> file pointing to the right jobtracker.
>
> Regards
> Mahadev
>
> > -----Original Message-----
> > From: Phantom [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, May 24, 2007 4:51 PM
> > To: [email protected]
> > Subject: Re: Configuration and Hadoop cluster setup
> >
> > Yes the files are the same and I am starting the tasks on the namenode
> > server. I also figured what my problem was with respect to not being
> able
> > to
> > start the namenode and job tracker on the same machine. I had to
> reformat
> > the file system. But the all this still doesn't cause the WordCount
> sample
> > to run in a distributed fashion. I can tell this because the
> > LocalJobRunner
> > is being used. Do I need to specify the config file to the running
> > instance
> > of the program ? If so how do I do that ?
> >
> > Thanks
> > A
> >
> > On 5/24/07, Dennis Kubes < [EMAIL PROTECTED]> wrote:
> > >
> > >
> > >
> > > Phantom wrote:
> > > > I am trying to run Hadoop on a cluster of 3 nodes. The namenode
> and
> > the
> > > > jobtracker web UI work. I have the namenode running on node A and
> job
> > > > tracker running on node B. Is it true that namenode and jobtracker
> > > cannot
> > > > run on the same box ?
> > >
> > > The namenode and the jobtracker can most definitely run on the same
> box.
> > >   As far as I know this is the preferred configuration.
> > >
> > > Also if I want to run the examples on the cluster is
> > > > there anything special that needs to be done. When I run the
> example
> > > > WordCount on a machine C (which is a task tracker and not a job
> > tracker)
> > > > the
> > > > LocalJobRunner is invoked all the time. I am guessing this means
> that
> > > the
> > > > map tasks are running locally. How can I distribute this on the
> > cluster
> > > ?
> > > > Please advice.
> > >
> > > Are the conf files on machine C the same as the namenode/jobtracker?
>
> > > Are they pointing to the namenode and jobtracker or are they
> pointing to
> > > local in the hadoop-site.xml file.  Also we have found it easier
> > > (although not necessarily better) to start tasks on the namenode
> server.
> > >
> > > It would be helpful to have more information about what is happening
> and
> > > your setup as that would help myself and others on the list debug
> what
> > > may be occurring.
> > >
> > > Dennis Kubes
> > >
> > > >
> > > > Thanks
> > > > Avinash
> > > >
> > >
>
>

Re: Configuration and Hadoop cluster setup

Reply via email to