[jira] Commented: (HADOOP-785) Divide the server and client configurations

Owen O'Malley (JIRA) Wed, 15 Aug 2007 14:21:55 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520098
 ]


Owen O'Malley commented on HADOOP-785:
--------------------------------------

I think that Sameer's proposal of having a fixed or configurable list of 
properties that is overridden when the task is localized for a given server is 
a very very good thing. It ensures that the override happens at exactly one 
spot, namely the switch over between the server code and the client code on the 
task tracker. Otherwise, we end up with the current situation where the 
hadoop-site on the various nodes that the configuration has gone through all 
can override various properties. In particular, I don't want the hadoop-site 
equivalent on the launching node to be consulted at all. It just provides one 
more location where things can be broken.

On the other hand, I've changed religions and I'm convinced that we want 
exactly one config file: hadoop-site.xml. All settings both client and server 
should go in there.

So my proposed dataflow for JobConfs looks like:

1. Client creates JobConf on the submit node, which reads hadoop-site.xml (and 
the readonly hadoop-default.xml).
2. Client fills in their desired values and submits it (by serializing it).
3. It is never ever modified by any other config files on any of the servers.
4. When the Task is starting, it is localized by looking in the server's config 
for hadoop.client.override for a list of properties to be copied over to the 
task's JobConf from the server's configuration.

The only piece that is missing is how to set the default number of reduces. And 
I think the best way is to introduce a new pair of attributes:
mapred.map.tasks.default
mapred.reduce.tasks.default
which are used if the specific values aren't set.

Also note that Path.getFileSystem() should take a Configuration so that it is 
compatible with both server configs and JobConfs.



> Divide the server and client configurations
> -------------------------------------------
>
>                 Key: HADOOP-785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>
> The configuration system is easy to misconfigure and I think we need to 
> strongly divide the server from client configs. 
> An example of the problem was a configuration where the task tracker has a 
> hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker 
> had the right number of reduces, but the map task thought there was a single 
> reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which 
> overrides both client and server configs. Furthermore, the properties from 
> the *-default.xml files should never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-785) Divide the server and client configurations

Reply via email to