Thank you very much! I'm clear about it now. 2009/8/20 Aaron Kimball <aa...@cloudera.com>
> On Wed, Aug 19, 2009 at 8:39 PM, yang song <hadoop.ini...@gmail.com> > wrote: > > > Thank you, Aaron. I've benefited a lot. "per-node" means some settings > > associated with the node. e.g., "fs.default.name", "mapred.job.tracker", > > etc. "per-job" means some settings associated with the jobs which are > > submited from the node. e.g., "mapred.reduce.tasks". That means, if I set > > "per-job" properties on JobTracker, it will doesn't work. Is my > > understanding right? > > > It will work if you submit your job (run "hadoop jar ....") from the > JobTracker node :) It won't if you submit your job from elsewhere. > > > > > > In addition, when I add some new properties, e.g., > > "mapred.inifok.setting" on JobTracker, I can find it in every job.xml > from > > WebUI. I think all jobs will use the new properties. Is it right? > > > If you set a property programmatically when configuring your job, that will > be available in the JobConf on all machines for that job only. If you set a > property in your hadoop-site.xml on the submitting machine, then I think > that will also be available for the job on all nodes. > > - Aaron > > > > > > Thanks again. > > Inifok > > > > 2009/8/20 Aaron Kimball <aa...@cloudera.com> > > > > > Hi Inifok, > > > > > > This is a confusing aspect of Hadoop, I'm afraid. > > > > > > Settings are divided into two categories: "per-job" and "per-node." > > > Unfortunately, which are which, isn't documented. > > > > > > Some settings are applied to the node that is being used. So for > example, > > > if > > > you set fs.default.name on a node to be "hdfs://some.namenode:8020/", > > then > > > any FS connections you make from that node will go to some.namenode. If > a > > > different machine in your cluster has fs.default.name set to > > > hdfs://other.namenode, then that machine will connect to a different > > > namenode. > > > > > > Another example of a per-machine setting is > > > mapred.tasktracker.map.tasks.maximum; this tells a tasktracker the > > maximum > > > number of tasks it should run in parallel. Each tasktracker is free to > > > configure this value differently. e.g., if you have some quad-core and > > some > > > eight-core machines. dfs.data.dir tells a datanode where its data > > > directories should be kept. Naturally, this can vary > machine-to-machine. > > > > > > Other settings are applied to a job as a whole. These settings are > > > configured when you submit the job. So if you write > > > conf.set("mapred.reduce.parallel.copies", 20) in your code, this will > be > > > the > > > setting for the job. Settings that you don't explicitly put in your > code, > > > are drawn from the hadoop-site.xml file on the machine where the job is > > > submitted from. > > > > > > In general, I strongly recommend you save yourself some pain by keeping > > > your > > > configuration files as identical as possible :) > > > Good luck, > > > - Aaron > > > > > > > > > On Wed, Aug 19, 2009 at 7:21 AM, yang song <hadoop.ini...@gmail.com> > > > wrote: > > > > > > > Hello, everybody > > > > I feel puzzled about setting properties in hadoop-site.xml. > > > > Suppose I submit the job from machine A, and JobTracker runs on > > > machine > > > > B. So there are two hadoop-site.xml files. Now, I increase > > > > "mapred.reduce.parallel.copies"(e.g. 10) on machine B since I want to > > > make > > > > copy phrase faster. However, "mapred.reduce.parallel.copies" from > WebUI > > > is > > > > still 5. When I increase it on machine A, it changes. So, I feel very > > > > puzzled. Why does it doesn't work when I change it on B? What's more, > > > when > > > > I > > > > add some properties on B, the certain properties will be found on > > WebUI. > > > > And > > > > why I can't change properties through machine B? Does some certain > > > > properties must be changed through A and some others must be changed > > > > through > > > > B? > > > > Thank you! > > > > Inifok > > > > > > > > > >