[ 
https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520057
 ] 

Doug Cutting commented on HADOOP-785:
-------------------------------------

Sameer, I think your proposal is mostly isomorphic to Arun's, but with 
redundancy.  The stuff in your hadoop-server.xml and hadoop.client.override is 
the same as in Arun's hadoop-final.xml in servers.  hadoop-final.xml would not 
normally exist on clients, only on servers.  So the difference between your 
proposals is (a) the name of the files; and (b) that you want to also list 
non-overideable parameter names in a parameter.  The latter seems fragile and 
hard to maintain to me.

It does make sense to have overrideable values on the server too, e.g., to 
determine the default block size for client programs which don't override it.  
Under Arun's proposal this would be in hadoop-initial.xml on the servers.  
Where would it be in your proposal?  As items in hadoop-server.xml that are not 
named in hadoop.client.override?  Is this really less confusing?

Another issue with your proposal is that it requires different Configuration 
construction code on clients and servers.  Do we always know, everywhere that a 
Configuration is created, whether we are running as a client or a server?  Our 
servers use much of our client code: a MapReduce server is an HDFS client, etc. 
 I think this is more reliably done by using uniform Configuration construction 
code, and simply configuring server hosts differently from client hosts, if 
different configurations are even required.  In most cases this should not be 
required, since clients have no little need to specify non-overrideable values, 
hence hadoop-final.xml should generally only exist on servers.

We're mostly talking about host-specific settings, not server/client 
distinctions.  Some things should not be overridden because they're specific to 
the host.  Thus they should be overridden by a file on that host whose sole 
purpose is to do this.  This concept makes sense on both client and server 
machines.

> Divide the server and client configurations
> -------------------------------------------
>
>                 Key: HADOOP-785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>
> The configuration system is easy to misconfigure and I think we need to 
> strongly divide the server from client configs. 
> An example of the problem was a configuration where the task tracker has a 
> hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker 
> had the right number of reduces, but the map task thought there was a single 
> reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which 
> overrides both client and server configs. Furthermore, the properties from 
> the *-default.xml files should never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to