[jira] Commented: (HADOOP-785) Divide the server and client configurations

Devaraj Das (JIRA) Fri, 03 Aug 2007 07:00:16 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517538
 ]


Devaraj Das commented on HADOOP-785:
------------------------------------

bq. I'd rather keep hadoop-default.xml sacrosanct, though we don't prevent you 
from editing it even today - thus it serves as a gold-standard for everyone

+1

bq. This division could be done with xml comments - I don't think it needs to 
be so formal as to need a new field.

+1

bq. Why don't you want to split up namenode vs. jobtracker and datanode vs. 
tasktracker? I understand that it's desirable to keep things simple, but dfs 
and mapreduce don't interact very much in terms of their configs, so there is a 
natural separation.

This probably could be addressed by having a clear (documentation wise) 
separation in the configuration file(s). This is already done today in the 
hadoop-default.xml file via the three sections "global properties", "map/reduce 
properties" and "file system properties". 

Having the classes {Client, Server, Job}Configuration seems interesting, but 
one issue that needs to be looked at is what Michael points out. Some config 
items would be needed by both server and client. Items like fs.default.name can 
be handled fairly easily though it amounts to having duplicate config items in 
the files. The other (more semantic) issue that needs to be looked at is for 
things like ipc.client.connection.maxidletime. This config item is used, for 
example, by the TaskTracker to set it's client side connection idle timeout for 
the RPCs to the JobTracker. However, it also affects the timeout that the Tasks 
(Map/Reduce) would see, and, unless we have different values for this item in 
the server and client config files, both the entities would see the same 
timeout value. This could be an issue (since for Tasks i would have the value 
set to a very high number - ref HADOOP-1651).
To summarize, we might end up having a couple of duplicate config items, 
potentially having different values. Does this seem like a problem? I am okay 
with such an arrangement but just wanted to bring out this issue while we are 
designing the system. By the way, this brings us to Doug's comment whether it 
makes sense to have a separate client-only configuration?

Also, the current framework has a bug - for e.g. , if i programmatically set 
speculative execution to false in the JobConf, it is not considered by the 
framework. The framework has already read the value from the config files it 
has before i submitted my job and doesn't take notice of my requirement. Now 
this is a good thing for some config items like fs.default.name, where we DON'T 
want clients to tell us what the namenode is, but not so for things like 
mapred.speculative.execution. This issue probably needs to handled in this 
redesign.

> Divide the server and client configurations
> -------------------------------------------
>
>                 Key: HADOOP-785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>
> The configuration system is easy to misconfigure and I think we need to 
> strongly divide the server from client configs. 
> An example of the problem was a configuration where the task tracker has a 
> hadoop-site.xml that set mapred.reduce.tasks to 1. Therefore, the job tracker 
> had the right number of reduces, but the map task thought there was a single 
> reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which 
> overrides both client and server configs. Furthermore, the properties from 
> the *-default.xml files should never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-785) Divide the server and client configurations

Reply via email to