Re: Two questions.

He Chen Wed, 03 Nov 2010 19:39:36 -0700

Both option 1 and 3 will work.

On Wed, Nov 3, 2010 at 9:28 PM, James Seigel <[email protected]> wrote:


> Option 1 = good
>
> Sent from my mobile. Please excuse the typos.
>
> On 2010-11-03, at 8:27 PM, "shangan" <[email protected]> wrote:
>
> > I don't think the first two options can work, even you stop the
> tasktracker these to-be-retired nodes are still connected to the namenode.
> > Option 3 can work.  You only need to add this exclude file on the
> namenode, and it is an regular file. Add a key named dfs.hosts.exclude to
> your conf/hadoop-site.xml file,The value associated with this key provides
> the full path to a file on the NameNode's local file system which contains a
> list of machines which are not permitted to connect to HDFS.
> >
> > Then you can run the command bin/hadoop dfsadmin -refreshNodes, then the
> cluster will decommission the nodes in the exclude file.This might take a
> period of time as the cluster need to move data from those retired nodes to
> left nodes.
> >
> > After this you can use these retired nodes as a new cluster.But remember
> to remove those nodes from the slave nodes file and you can delete the
> exclude file afterward.
> >
> >
> > 2010-11-04
> >
> >
> >
> > shangan
> >
> >
> >
> >
> > 发件人： Raj V
> > 发送时间： 2010-11-04  10:05:44
> > 收件人： common-user
> > 抄送：
> > 主题： Two questions.
> >
> > 1. I have a 512 node cluster. I need to have 32 nodes do something else.
> They
> > can be datanodes but I cannot run any map or reduce jobs on them. So I
> see three
> > options.
> > 1. Stop the tasktracker on those nodes. leave the datanode running.
> > 2. Set  mapred.tasktracker.reduce.tasks.maximum and
> > mapred.tasktracker.map.tasks.maximum to 0 on these nodes and make these
> final.
> > 3. Use the parameter mapred.hosts.exclude.
> > I am assuming that any of the three methods would work.  To start with, I
> went
> > with option 3. I used a local file /home/hadoop/myjob.exclude and the
> file
> > myjob.exclude had the hostname of one host per line ( hadoop-480 ..
> hadoop-511.
> > But I see both map and reduce jobs being scheduled to all the 511 nodes.
> > I understand there is an inherent inefficieny by running only the data
> node on
> > these 32 nodess.
> > Here are my questions.
> > 1. Will all three methods work?
> > 2. If I choose method 3, does this file exist as a dfs file or a regular
> file.
> > If regular file , does it need to exist on all the nodes or only the node
> where
> > teh job is submitted?
> > Many thanks in advance/
> > Raj
> > __________ Information from ESET NOD32 Antivirus, version of virus
> signature database 5574 (20101029) __________
> > The message was checked by ESET NOD32 Antivirus.
> > http://www.eset.com
>

Re: Two questions.

Reply via email to