[jira] Commented: (HADOOP-785) Divide the server and client configurations

Doug Cutting (JIRA) Wed, 15 Aug 2007 14:07:55 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520092
 ]


Doug Cutting commented on HADOOP-785:
-------------------------------------

bq. The default block size for client programs would be in hadoop-client.xml [ 
...] 

Where would the default block size for server programs be set?  In 
hadoop-server.xml?

It sounds like you want to break what Arun's calling hadoop-initial.xml into 
two files: a client and server version, and replace hadoop-final.xml with a 
parameter that names those values which may not be overridden, but that 
parameter is only used on "servers"?  Is that a fair comparison?

My belief is that the primary reason we've seen misconfiguration that that 
folks don't understand that hadoop-site.xml is not overrideable on servers by 
jobs.  We've encouraged folks to put most things in that file 
(hadoop-site.xml), when in fact it should only be used for very limited 
purposes, mostly for host-specific paths.  This has caused many serious 
problems.  But we shouldn't overreact.  We should fix this issue.  We should 
make it clearer where most things belong, and what particular things should not 
be overrideable.

The root of the problem might be:

http://lucene.apache.org/hadoop/api/overview-summary.html#overview_description

This is where we've first encouraged all users of Hadoop to edit the wrong file.

I don't think that, long-term, client and server are fundamental distinctions 
in Hadoop, we run clients on servers and will probably do the converse someday, 
so I am hesitant to hardwire these in as fundamental concepts in the 
configuration system, which is fundamental.  I think the notion of 
host-specific settings which cannot be overridden is a universal concept and 
would rather focus on making that distinction clear to users.

> Divide the server and client configurations
> -------------------------------------------
>
>                 Key: HADOOP-785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>
> The configuration system is easy to misconfigure and I think we need to 
> strongly divide the server from client configs. 
> An example of the problem was a configuration where the task tracker has a 
> hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker 
> had the right number of reduces, but the map task thought there was a single 
> reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which 
> overrides both client and server configs. Furthermore, the properties from 
> the *-default.xml files should never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-785) Divide the server and client configurations

Reply via email to