[ 
https://issues.apache.org/jira/browse/HADOOP-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602175#action_12602175
 ] 

Hemanth Yamijala commented on HADOOP-3479:
------------------------------------------

Some initial thoughts, combining ideas Vivek expressed on a mail to core-dev.

We have the following options to consider:

- Format for the new configuration options
- Where to define them in. Could be related to the format.

Looking at format first, I can think of 2 options. Given that there is a set of 
properties (for e.g. user list, default priority, resource limit, etc) that are 
related, we could have a nested XML format - something like:
{code:xml}
<Grid>
  <Organization>
    <Name>org1</Name>
    <MaxCapacity>100</MaxCapacity>
    <Queues>
      <Queue>
        <Name>queue1</Name>
        <AllowedUsers>u1,u2,u3</AllowedUsers>
        <DisallowedUsers>u3,u4,u5</DisallowedUsers>
        <AllowedOverrides>False</AllowedOverrides>
      </Queue>
    </Queues>
</Grid>
{code}

IMO, this is an intuitive model to capture the configuration. We can use a DOM 
parser like what we use currently in Configuration.java to construct Scheduler 
config objects.

The main drawbacks of this approach are:
- A new format to administer. Could be a pain for administrators.
- Parity expectations with other features provided by standard Hadoop 
configuration.

The other format tries to retain the same structure as current Hadoop 
configuration, which is truly like a list of key and value pairs. Here's an 
example:
{code:xml}
<property>
    <name>hadoop.scheduler.orgs</name>
    <value>Org1,Org2</value>
    <description>Comma separated list of Org names</description>
</property>
<property>
    <name>hadoop.scheduler.Org1.max-capacity</name>
    <value>100</value>
</property>
<property>
    <name>hadoop.scheduler.Org1.queues</name>
    <value>q1,q2</value>
</property>
<property>
    <name>hadoop.scheduler.Org1.q1.allowedusers</name>
    <value>u1,u2,u3</value>
</property>
{code}

As shown above, the keys for the properties are used to indicate the grouping. 
All the properties for an Org would be under hadoop.scheduler.org-name, 
likewise for Queues. This implies that we need property names to be dynamically 
built by code and that we may need ways of listing all children of a given 
property - something that can possibly be solved using HADOOP-3407.

This format is much less intuitive, and maybe error prone to administer ? Also, 
we may have unnecessary restrictions or special handling on names. For e.g. a 
queue name cannot contain a '.'

However, it allows us to have the same format as in Hadoop and use the same 
features related to configuration. It would also help us to reuse the basic 
code for parsing and reading in configuration.

Regarding where to define this configuration, the first format will necessitate 
a new file, as the current hadoop configs are truly a single level hierarchy. 
The second option allows us to continue to use the current Hadoop config files: 
hadoop-defaults.xml and hadoop-site.xml.

Having a separate file has the benefits that we can define policies around how 
to manage updates to the file (for e.g. by reading it periodically, etc). 
However, it would add admin overhead, in that there is now one more file to 
administer.

Personally, I prefer the more intuitive format of Option 1. Though some 
learning is involved, it may be easier to learn this format. 

Comments from others ? Any other options ?

> Implement configuration items useful for Hadoop resource manager (v1)
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-3479
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3479
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>
> HADOOP-3421 lists requirements for a new resource manager for Hadoop. 
> Implementation for these will require support for new configuration items in 
> Hadoop. This JIRA is to define such configuration, and track it's 
> implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to