Re: Configuration policy

Paul Sutter Tue, 20 Jun 2006 17:10:47 -0700

Thanks very much for the explanation, and to confirm I will repeat it:


The first occurence of a parameter is used, and the search order is:

hadoop-site.xml, then
job.xml, then
mapred-default.xml, then
hadoop-default.xml

Thats great, and it explains behavior that had been confusing before.

It would indeed be good to rename mapred-default.xml to something that makes
sense (I would suggest changing both the "mapred" part, and the "default"
part. "mapred" says little to set the file apart from "hadoop", and
"default" doesnt do a good job of describing something that is site-specific
instead of factory default).

On 6/20/06, Owen O'Malley <[EMAIL PROTECTED]> wrote:

On Jun 20, 2006, at 9:29 AM, Paul Sutter wrote:

> Speaking of configuration, is there any clear definition for the
> purpose of
> mapred-default.xml? My understanding is that its an alternate,
> misnamed,
> site-local configuration, but we're not sure what to do with it.
>
> Right now, we make all of our changes to hadoop-site.xml, then copy
> that
> file to mapred-default.xml because we've heard that sometimes, that
> file
> gets checked instead of hadoop-site.xml.
>
> Any help appreciated

My general approach is that only things that the user/application
should never change are in hadoop-site. Largely, this is limited to the
namenode/jobtracker addresses, port, and directories. Everything else
goes into mapred-default.xml. This includes things like:

dfs.block.size
io.sort.factor
io.sort.mb
etc....

This happens because of the load order of the config files:

hadoop-default.xml, mapred-default.xml, job.xml, hadoop-site.xml.

so job.xml will override the default files, but NOT the hadoop-site. I
think that mapred-default would be better named site-default or
something.

-- Owen

Re: Configuration policy

Reply via email to