[ https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519484 ]
Doug Cutting commented on HADOOP-785: ------------------------------------- A few comments: 1. A job is created by a client, so it gets values from hadoop-client.xml as well as job-specific things set by the application, right? Then the job is serialized as a job.xml and sent to a server. When it is read on the server, should any other configuration files be read at all? I think perhaps not. Job defaults, site specifics, etc. should all be determined at the client. If a value from hadoop-server.xml is to be considered, then the parameter is not client-overrideable. Conversely, if a value is client overrideable, then the value in hadoop-server.xml will not be consulted, only the value in job.xml will be seen. A job,xml should contain a complete, standalone set of values, no? So there are two ways to create a JobConfiguration: one that reads hadoop-default.xml, hadoop-site,xml and hadoop-client.xml, and one that only reads job.xml. 2. Many parameters should either be in hadoop-client.xml or hadoop-server.xml, but not both. Thus we can organize the defaults into separate sections for client and server. Parameters that are used by both clients and servers can be in a "universal" section: these may be meaningfully added to the client, server or site configuration. The top-level organization of hadoop-default.xml can be by technology (hdfs, mapred, etc.) and within that sub-sections for universal, client and server parameters. This can provide folks a guide for where things are intended to be overridden. > Divide the server and client configurations > ------------------------------------------- > > Key: HADOOP-785 > URL: https://issues.apache.org/jira/browse/HADOOP-785 > Project: Hadoop > Issue Type: Improvement > Components: conf > Affects Versions: 0.9.0 > Reporter: Owen O'Malley > Assignee: Arun C Murthy > Fix For: 0.15.0 > > > The configuration system is easy to misconfigure and I think we need to > strongly divide the server from client configs. > An example of the problem was a configuration where the task tracker has a > hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker > had the right number of reduces, but the map task thought there was a single > reduce. This lead to a hard to find diagnose failure. > Therefore, I propose separating out the configuration types as: > class Configuration; > // reads site-default.xml, hadoop-default.xml > class ServerConf extends Configuration; > // reads hadoop-server.xml, $super > class DfsServerConf extends ServerConf; > // reads dfs-server.xml, $super > class MapRedServerConf extends ServerConf; > // reads mapred-server.xml, $super > class ClientConf extends Configuration; > // reads hadoop-client.xml, $super > class JobConf extends ClientConf; > // reads job.xml, $super > Note in particular, that nothing corresponds to hadoop-site.xml, which > overrides both client and server configs. Furthermore, the properties from > the *-default.xml files should never be saved into the job.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.