[jira] Commented: (HADOOP-785) Divide the server and client configurations

Doug Cutting (JIRA) Wed, 06 Dec 2006 12:15:44 -0800

    [ 
http://issues.apache.org/jira/browse/HADOOP-785?page=comments#action_12456192 ] 
            
Doug Cutting commented on HADOOP-785:
-------------------------------------


I think this is the right direction.  We logically have a tree.  Each node 
corresponds to a config file that inherits and overrides its parent's files.

The need is that users be able to easily (1) remember the tree, (2) know where 
to specify a property within the tree.

I propose that the tree is organized around *where* in the cluster things are 
used, not *what* part of the code they configure (that's determined by the 
parameter name).  This addresses the primary source of confusion, and thus is 
what we must clarify.  In particular we should distinguish between things used 
only by servers, and things that clients may specify.

I propose the following tree:

default --read-only defaults for things that clients can override
  site -- site-specific defaults
    server-default -- read-only defaults for server-only configuration
      server -- server overrides for this site
    client -- user overrides

The read-only default files serve as documentation of what parameters can be 
added to files lower in the tree.  It is a configuration error to specify 
something that does not have a default value above it.

Some examples of what might be in the three non read-only files:
 
site - - site-specific defaults
  dfs.namenode.host&port
  dfs.block.size
  dfs.replication
  mapred.jobtracker.host&port
  mapred.map.tasks
  mapred.reduce.tasks

server -- server-specifics
   dfs.name.dir
   dfs.data.dir
   mapred.local.dir

client -- user can override defaults and site here, but not server
  dfs.replication -- user overrides site
  mapred.map.tasks -- user overrides site

Following from this, we'd have the following instantiable classes:

ServerConfiguration
  reads default, site, server-default, server, in that order.
  used by daemons

ClientConfiguration
   reads default, site, client, in that order.
   used by client applications

Rather than provide subclasses for different parts of the system, we should 
instead use static methods.  For example, we might have:

JobConf.setNumMapTasks(ClientConfiguration conf, int count);
HdfsConf.setReplication(ClientConfiguration conf, int replicas);

The point of these is compile-time checking of names and values while keeping 
the code well partitioned.  When we add a new HDFS parameter we should not have 
to change code outside of HDFS, yet, without multiple-inheritance, we cannot 
have a single object that permits configuration of HDFS, MapReduce, etc.

Thoughts?


  

> Divide the server and client configurations
> -------------------------------------------
>
>                 Key: HADOOP-785
>                 URL: http://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>             Fix For: 0.10.0
>
>
> The configuration system is easy to misconfigure and I think we need to 
> strongly divide the server from client configs. 
> An example of the problem was a configuration where the task tracker has a 
> hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker 
> had the right number of reduces, but the map task thought there was a single 
> reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which 
> overrides both client and server configs. Furthermore, the properties from 
> the *-default.xml files should never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-785) Divide the server and client configurations

Reply via email to