Proposed init-hcfs improvements

Jonathan Kelly Mon, 14 Sep 2015 17:38:36 -0700

Hey, all,

I'm working on some improvements to init-hcfs.groovy in order to make
init-hcfs.json more maintainable and also add the ability to make it more
configurable (i.e., only create directories for the apps that are
installed, if using the newly merged roles feature), but before I get too
far I'd like to run my ideas by the community.


Here is how I'm thinking it will work:

   1. In the site.yaml, you'll specify some new properties that look
   something like [1].
      - Alternatively, rather than having to specify the directories via
      site.yaml, maybe we can define a new resource type representing an HDFS
      directory and declare all required HDFS directories for each app in their
      own manifest files. (I think this is the spirit behind
      https://issues.apache.org/jira/browse/BIGTOP-1772?) But in order to
      do this we'd need some Puppet magic that can aggregate all of these
      resources and only call init-hdfs once, passing in all of these
resources.
      Is this even possible? I know resource collectors can be used to
order all
      of these "hdfs_dir" resources before the "init_hdfs" exec resource, but I
      don't know how you'd make the "init_hdfs" exec resource operate on all of
      the collected "hdfs_dir" resources, if I'm making any sense.
   2. The hadoop::init_hdfs Puppet class will write out a file called
   /var/lib/hadoop-hdfs/init-hcfs.yaml that looks something like [2] (very
   similar to [1] of course).
      - Note that I think it might be best to use YAML instead of JSON,
      since YAML files are much easier to write out using a template. On the
      other hand, Groovy doesn't have built-in support for YAML like it has for
      JSON, so it might be more difficult to modify init-hcfs.groovy to consume
      this YAML file, or at least I'd need to add SnakeYAML as a
dependency. Any
      thoughts on this?
      - If I do add a dependency on SnakeYAML, it would probably also be
      good to implement https://issues.apache.org/jira/browse/BIGTOP-1871
      along with this change so that init-hcfs.groovy can have its own install
      location rather than being part of the hadoop-hdfs package.
   3. When calling init-hdfs.sh, the hadoop::init_hdfs Puppet class will
   pass it this newly generated /var/lib/hadoop-hdfs/init-hcfs.yaml file,
   which it will pass through to init-hcfs.groovy.
   4. Finally, init-hcfs.groovy will be changed to read from this YAML file
   in the format below instead of the hardcoded JSON file we have been using,
   assuming the community is OK with this file format change.

Thanks,
Jonathan Kelly

[1]
hadoop::init_hdfs::hdfs_root_user: hdfs
hadoop::init_hdfs::dirs:
  /tmp:
    perms: 1777
  /user:
    perms: 755
    owner: ${hadoop::init_hdfs::hdfs_root_user}
  /user/root:
    perms: 777
    owner: root
  /var/log:
    perms: 1775
    owner: yarn
    group: mapred
  /tmp/hadoop-yarn:
    perms: 777
    owner: mapred
    group: mapred
  /var/log/hadoop/yarn/apps:
    perms: 1777
    owner: yarn
    group: mapred
  /user/history:
    perms: 755
    owner: mapred
    group: mapred
hadoop::init_hdfs::users:
  tom:
    perms: 755
  alice:
    perms: 755
  bigtop:
    perms: 755

[2]
---
root_user: 'hdfs'
dirs:
  '/tmp':
    'perms': '1777'
  '/tmp/hadoop-yarn':
    'perms': '777'
    'owner': 'mapred'
    'group': 'mapred'
  '/user':
    'perms': '755'
    'owner': 'hdfs'
  '/user/history':
    'perms': '755'
    'owner': 'mapred'
    'group': 'mapred'
  '/user/root':
    'perms': '777'
    'owner': 'root'
  '/var/log':
    'perms': '1775'
    'owner': 'yarn'
    'group': 'mapred'
  '/var/log/hadoop/yarn/apps':
    'perms': '1777'
    'owner': 'yarn'
    'group': 'mapred'
users:
  'tom':
    'perms': '755'
  'alice':
    'perms': '755'
  'bigtop':
    'perms': '755'

Proposed init-hcfs improvements

Reply via email to