Thank you very much! I'm clear about it now.

2009/8/20 Aaron Kimball <aa...@cloudera.com>

> On Wed, Aug 19, 2009 at 8:39 PM, yang song <hadoop.ini...@gmail.com>
> wrote:
>
> >    Thank you, Aaron. I've benefited a lot. "per-node" means some settings
> > associated with the node. e.g., "fs.default.name", "mapred.job.tracker",
> > etc. "per-job" means some settings associated with the jobs which are
> > submited from the node. e.g., "mapred.reduce.tasks". That means, if I set
> > "per-job" properties on JobTracker, it will doesn't work. Is my
> > understanding right?
>
>
> It will work if you submit your job (run "hadoop jar ....") from the
> JobTracker node :) It won't if you submit your job from elsewhere.
>
>
> >
> >    In addition, when I add some new properties, e.g.,
> > "mapred.inifok.setting" on JobTracker, I can find it in every job.xml
> from
> > WebUI. I think all jobs will use the new properties. Is it right?
>
>
> If you set a property programmatically when configuring your job, that will
> be available in the JobConf on all machines for that job only. If you set a
> property in your hadoop-site.xml on the submitting machine, then I think
> that will also be available for the job on all nodes.
>
> - Aaron
>
>
> >
> >    Thanks again.
> >    Inifok
> >
> > 2009/8/20 Aaron Kimball <aa...@cloudera.com>
> >
> > > Hi Inifok,
> > >
> > > This is a confusing aspect of Hadoop, I'm afraid.
> > >
> > > Settings are divided into two categories: "per-job" and "per-node."
> > > Unfortunately, which are which, isn't documented.
> > >
> > > Some settings are applied to the node that is being used. So for
> example,
> > > if
> > > you set fs.default.name on a node to be "hdfs://some.namenode:8020/",
> > then
> > > any FS connections you make from that node will go to some.namenode. If
> a
> > > different machine in your cluster has fs.default.name set to
> > > hdfs://other.namenode, then that machine will connect to a different
> > > namenode.
> > >
> > > Another example of a per-machine setting is
> > > mapred.tasktracker.map.tasks.maximum; this tells a tasktracker the
> > maximum
> > > number of tasks it should run in parallel. Each tasktracker is free to
> > > configure this value differently. e.g., if you have some quad-core and
> > some
> > > eight-core machines. dfs.data.dir tells a datanode where its data
> > > directories should be kept. Naturally, this can vary
> machine-to-machine.
> > >
> > > Other settings are applied to a job as a whole. These settings are
> > > configured when you submit the job. So if you write
> > > conf.set("mapred.reduce.parallel.copies", 20) in your code, this will
> be
> > > the
> > > setting for the job. Settings that you don't explicitly put in your
> code,
> > > are drawn from the hadoop-site.xml file on the machine where the job is
> > > submitted from.
> > >
> > > In general, I strongly recommend you save yourself some pain by keeping
> > > your
> > > configuration files as identical as possible :)
> > > Good luck,
> > > - Aaron
> > >
> > >
> > > On Wed, Aug 19, 2009 at 7:21 AM, yang song <hadoop.ini...@gmail.com>
> > > wrote:
> > >
> > > > Hello, everybody
> > > >    I feel puzzled about setting properties in hadoop-site.xml.
> > > >    Suppose I submit the job from machine A, and JobTracker runs on
> > > machine
> > > > B. So there are two hadoop-site.xml files. Now, I increase
> > > > "mapred.reduce.parallel.copies"(e.g. 10) on machine B since I want to
> > > make
> > > > copy phrase faster. However, "mapred.reduce.parallel.copies" from
> WebUI
> > > is
> > > > still 5. When I increase it on machine A, it changes. So, I feel very
> > > > puzzled. Why does it doesn't work when I change it on B? What's more,
> > > when
> > > > I
> > > > add some properties on B, the certain properties will be found on
> > WebUI.
> > > > And
> > > > why I can't change properties through machine B? Does some certain
> > > > properties must be changed through A and some others must be changed
> > > > through
> > > > B?
> > > >    Thank you!
> > > >    Inifok
> > > >
> > >
> >
>

Reply via email to