Both option 1 and 3 will work. On Wed, Nov 3, 2010 at 9:28 PM, James Seigel <[email protected]> wrote:
> Option 1 = good > > Sent from my mobile. Please excuse the typos. > > On 2010-11-03, at 8:27 PM, "shangan" <[email protected]> wrote: > > > I don't think the first two options can work, even you stop the > tasktracker these to-be-retired nodes are still connected to the namenode. > > Option 3 can work. You only need to add this exclude file on the > namenode, and it is an regular file. Add a key named dfs.hosts.exclude to > your conf/hadoop-site.xml file,The value associated with this key provides > the full path to a file on the NameNode's local file system which contains a > list of machines which are not permitted to connect to HDFS. > > > > Then you can run the command bin/hadoop dfsadmin -refreshNodes, then the > cluster will decommission the nodes in the exclude file.This might take a > period of time as the cluster need to move data from those retired nodes to > left nodes. > > > > After this you can use these retired nodes as a new cluster.But remember > to remove those nodes from the slave nodes file and you can delete the > exclude file afterward. > > > > > > 2010-11-04 > > > > > > > > shangan > > > > > > > > > > 发件人: Raj V > > 发送时间: 2010-11-04 10:05:44 > > 收件人: common-user > > 抄送: > > 主题: Two questions. > > > > 1. I have a 512 node cluster. I need to have 32 nodes do something else. > They > > can be datanodes but I cannot run any map or reduce jobs on them. So I > see three > > options. > > 1. Stop the tasktracker on those nodes. leave the datanode running. > > 2. Set mapred.tasktracker.reduce.tasks.maximum and > > mapred.tasktracker.map.tasks.maximum to 0 on these nodes and make these > final. > > 3. Use the parameter mapred.hosts.exclude. > > I am assuming that any of the three methods would work. To start with, I > went > > with option 3. I used a local file /home/hadoop/myjob.exclude and the > file > > myjob.exclude had the hostname of one host per line ( hadoop-480 .. > hadoop-511. > > But I see both map and reduce jobs being scheduled to all the 511 nodes. > > I understand there is an inherent inefficieny by running only the data > node on > > these 32 nodess. > > Here are my questions. > > 1. Will all three methods work? > > 2. If I choose method 3, does this file exist as a dfs file or a regular > file. > > If regular file , does it need to exist on all the nodes or only the node > where > > teh job is submitted? > > Many thanks in advance/ > > Raj > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5574 (20101029) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com >
