[
http://issues.apache.org/jira/browse/HADOOP-785?page=comments#action_12456192 ]
Doug Cutting commented on HADOOP-785:
-------------------------------------
I think this is the right direction. We logically have a tree. Each node
corresponds to a config file that inherits and overrides its parent's files.
The need is that users be able to easily (1) remember the tree, (2) know where
to specify a property within the tree.
I propose that the tree is organized around *where* in the cluster things are
used, not *what* part of the code they configure (that's determined by the
parameter name). This addresses the primary source of confusion, and thus is
what we must clarify. In particular we should distinguish between things used
only by servers, and things that clients may specify.
I propose the following tree:
default --read-only defaults for things that clients can override
site -- site-specific defaults
server-default -- read-only defaults for server-only configuration
server -- server overrides for this site
client -- user overrides
The read-only default files serve as documentation of what parameters can be
added to files lower in the tree. It is a configuration error to specify
something that does not have a default value above it.
Some examples of what might be in the three non read-only files:
site - - site-specific defaults
dfs.namenode.host&port
dfs.block.size
dfs.replication
mapred.jobtracker.host&port
mapred.map.tasks
mapred.reduce.tasks
server -- server-specifics
dfs.name.dir
dfs.data.dir
mapred.local.dir
client -- user can override defaults and site here, but not server
dfs.replication -- user overrides site
mapred.map.tasks -- user overrides site
Following from this, we'd have the following instantiable classes:
ServerConfiguration
reads default, site, server-default, server, in that order.
used by daemons
ClientConfiguration
reads default, site, client, in that order.
used by client applications
Rather than provide subclasses for different parts of the system, we should
instead use static methods. For example, we might have:
JobConf.setNumMapTasks(ClientConfiguration conf, int count);
HdfsConf.setReplication(ClientConfiguration conf, int replicas);
The point of these is compile-time checking of names and values while keeping
the code well partitioned. When we add a new HDFS parameter we should not have
to change code outside of HDFS, yet, without multiple-inheritance, we cannot
have a single object that permits configuration of HDFS, MapReduce, etc.
Thoughts?
> Divide the server and client configurations
> -------------------------------------------
>
> Key: HADOOP-785
> URL: http://issues.apache.org/jira/browse/HADOOP-785
> Project: Hadoop
> Issue Type: Improvement
> Components: conf
> Affects Versions: 0.9.0
> Reporter: Owen O'Malley
> Assigned To: Arun C Murthy
> Fix For: 0.10.0
>
>
> The configuration system is easy to misconfigure and I think we need to
> strongly divide the server from client configs.
> An example of the problem was a configuration where the task tracker has a
> hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker
> had the right number of reduces, but the map task thought there was a single
> reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which
> overrides both client and server configs. Furthermore, the properties from
> the *-default.xml files should never be saved into the job.xml.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira