Thanks very much for the explanation, and to confirm I will repeat it:
The first occurence of a parameter is used, and the search order is: hadoop-site.xml, then job.xml, then mapred-default.xml, then hadoop-default.xml Thats great, and it explains behavior that had been confusing before. It would indeed be good to rename mapred-default.xml to something that makes sense (I would suggest changing both the "mapred" part, and the "default" part. "mapred" says little to set the file apart from "hadoop", and "default" doesnt do a good job of describing something that is site-specific instead of factory default). On 6/20/06, Owen O'Malley <[EMAIL PROTECTED]> wrote:
On Jun 20, 2006, at 9:29 AM, Paul Sutter wrote: > Speaking of configuration, is there any clear definition for the > purpose of > mapred-default.xml? My understanding is that its an alternate, > misnamed, > site-local configuration, but we're not sure what to do with it. > > Right now, we make all of our changes to hadoop-site.xml, then copy > that > file to mapred-default.xml because we've heard that sometimes, that > file > gets checked instead of hadoop-site.xml. > > Any help appreciated My general approach is that only things that the user/application should never change are in hadoop-site. Largely, this is limited to the namenode/jobtracker addresses, port, and directories. Everything else goes into mapred-default.xml. This includes things like: dfs.block.size io.sort.factor io.sort.mb etc.... This happens because of the load order of the config files: hadoop-default.xml, mapred-default.xml, job.xml, hadoop-site.xml. so job.xml will override the default files, but NOT the hadoop-site. I think that mapred-default would be better named site-default or something. -- Owen
