[ https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520057 ]
Doug Cutting commented on HADOOP-785: ------------------------------------- Sameer, I think your proposal is mostly isomorphic to Arun's, but with redundancy. The stuff in your hadoop-server.xml and hadoop.client.override is the same as in Arun's hadoop-final.xml in servers. hadoop-final.xml would not normally exist on clients, only on servers. So the difference between your proposals is (a) the name of the files; and (b) that you want to also list non-overideable parameter names in a parameter. The latter seems fragile and hard to maintain to me. It does make sense to have overrideable values on the server too, e.g., to determine the default block size for client programs which don't override it. Under Arun's proposal this would be in hadoop-initial.xml on the servers. Where would it be in your proposal? As items in hadoop-server.xml that are not named in hadoop.client.override? Is this really less confusing? Another issue with your proposal is that it requires different Configuration construction code on clients and servers. Do we always know, everywhere that a Configuration is created, whether we are running as a client or a server? Our servers use much of our client code: a MapReduce server is an HDFS client, etc. I think this is more reliably done by using uniform Configuration construction code, and simply configuring server hosts differently from client hosts, if different configurations are even required. In most cases this should not be required, since clients have no little need to specify non-overrideable values, hence hadoop-final.xml should generally only exist on servers. We're mostly talking about host-specific settings, not server/client distinctions. Some things should not be overridden because they're specific to the host. Thus they should be overridden by a file on that host whose sole purpose is to do this. This concept makes sense on both client and server machines. > Divide the server and client configurations > ------------------------------------------- > > Key: HADOOP-785 > URL: https://issues.apache.org/jira/browse/HADOOP-785 > Project: Hadoop > Issue Type: Improvement > Components: conf > Affects Versions: 0.9.0 > Reporter: Owen O'Malley > Assignee: Arun C Murthy > Fix For: 0.15.0 > > > The configuration system is easy to misconfigure and I think we need to > strongly divide the server from client configs. > An example of the problem was a configuration where the task tracker has a > hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker > had the right number of reduces, but the map task thought there was a single > reduce. This lead to a hard to find diagnose failure. > Therefore, I propose separating out the configuration types as: > class Configuration; > // reads site-default.xml, hadoop-default.xml > class ServerConf extends Configuration; > // reads hadoop-server.xml, $super > class DfsServerConf extends ServerConf; > // reads dfs-server.xml, $super > class MapRedServerConf extends ServerConf; > // reads mapred-server.xml, $super > class ClientConf extends Configuration; > // reads hadoop-client.xml, $super > class JobConf extends ClientConf; > // reads job.xml, $super > Note in particular, that nothing corresponds to hadoop-site.xml, which > overrides both client and server configs. Furthermore, the properties from > the *-default.xml files should never be saved into the job.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.