[jira] Commented: (HADOOP-785) Divide the server and client configurations

Doug Cutting (JIRA) Wed, 05 Sep 2007 11:29:59 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525178
 ]


Doug Cutting commented on HADOOP-785:
-------------------------------------

I think we should implement a simple, consistent policy: 
 - a configuration processes a single list of configuration files;
 - any file can contain parameters labeled 'final';
 - final parameters may not be altered by subsequent files.
 - serializations of a Configuration, like job.xml, will not contain any final 
declarations.

There need be no special cases for hadoop-default.xml or hadoop-site.xml.  
They're just the first and second files in the list.  Thus if someone specifies 
a final parameter in hadoop-default.xml it is effectively a constant.  If 
someone specifies a final value in a client-side hadoop-site.xml, then that 
value may still be overridden in a task process, where the local 
hadoop-site.xml file will be loaded before the final-free job.xml.

We should deprecate the Configuration methods addFinalResource and 
addDefaultResource, and replace them with a public addResource method.  For 
back-compatibility, we must still keep track of the list of resources added, 
and of the position of last default resource in that list, and 
addDefaultResource must insert itself in the list after the last default 
resource and then trigger reloading of all resources.  But this 
back-compatibility code should be clearly marked for removal when these 
deprecated methods are removed.  In the long-term we should no longer need to 
re-load resources or even track the list of resources added.  The addResource 
method will simply load the file into the configuration.

Does that sound reasonable?

> Divide the server and client configurations
> -------------------------------------------
>
>                 Key: HADOOP-785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-785_1_20070903.patch
>
>
> The configuration system is easy to misconfigure and I think we need to 
> strongly divide the server from client configs. 
> An example of the problem was a configuration where the task tracker has a 
> hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker 
> had the right number of reduces, but the map task thought there was a single 
> reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which 
> overrides both client and server configs. Furthermore, the properties from 
> the *-default.xml files should never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-785) Divide the server and client configurations

Reply via email to