Sounds good to me: the roles are definitely the way to go.

1) I suggest keeping json , but won't turn down a patch that updates to yaml so 
long as docs are fixed accordingly as well to. This might not be worth it 
though.

2) Init-Hdfs.sh should overtime be deleted and replaced with pure hcfs  
implementation, so maybe we can do that as part of this work?

> On Sep 14, 2015, at 8:37 PM, Jonathan Kelly <[email protected]> wrote:
> 
> Hey, all,
> 
> I'm working on some improvements to init-hcfs.groovy hiin order to make
> init-hcfs.json more maintainable and also add the ability to make it more
> configurable (i.e., only create directories for the apps that are
> installed, if using the newly merged roles feature), but before I get too
> far I'd like to run my ideas by the community.
> 
> Here is how I'm thinking it will work:
> 
>   1. In the site.yaml, you'll specify some new properties that look
>   something like [1].
>      - Alternatively, rather than having to specify the directories via
>      site.yaml, maybe we can define a new resource type representing an HDFS
>      directory and declare all required HDFS directories for each app in their
>      own manifest files. (I think this is the spirit behind
>      https://issues.apache.org/jira/browse/BIGTOP-1772?) But in order to
>      do this we'd need some Puppet magic that can aggregate all of these
>      resources and only call init-hdfs once, passing in all of these
> resources.
>      Is this even possible? I know resource collectors can be used to
> order all
>      of these "hdfs_dir" resources before the "init_hdfs" exec resource, but I
>      don't know how you'd make the "init_hdfs" exec resource operate on all of
>      the collected "hdfs_dir" resources, if I'm making any sense.
>   2. The hadoop::init_hdfs Puppet class will write out a file called
>   /var/lib/hadoop-hdfs/init-hcfs.yaml that looks something like [2] (very
>   similar to [1] of course).
>      - Note that I think it might be best to use YAML instead of JSON,
>      since YAML files are much easier to write out using a template. On the
>      other hand, Groovy doesn't have built-in support for YAML like it has for
>      JSON, so it might be more difficult to modify init-hcfs.groovy to consume
>      this YAML file, or at least I'd need to add SnakeYAML as a
> dependency. Any
>      thoughts on this?
>      - If I do add a dependency on SnakeYAML, it would probably also be
>      good to implement https://issues.apache.org/jira/browse/BIGTOP-1871
>      along with this change so that init-hcfs.groovy can have its own install
>      location rather than being part of the hadoop-hdfs package.
>   3. When calling init-hdfs.sh, the hadoop::init_hdfs Puppet class will
>   pass it this newly generated /var/lib/hadoop-hdfs/init-hcfs.yaml file,
>   which it will pass through to init-hcfs.groovy.
>   4. Finally, init-hcfs.groovy will be changed to read from this YAML file
>   in the format below instead of the hardcoded JSON file we have been using,
>   assuming the community is OK with this file format change.
> 
> Thanks,
> Jonathan Kelly
> 
> [1]
> hadoop::init_hdfs::hdfs_root_user: hdfs
> hadoop::init_hdfs::dirs:
>  /tmp:
>    perms: 1777
>  /user:
>    perms: 755
>    owner: ${hadoop::init_hdfs::hdfs_root_user}
>  /user/root:
>    perms: 777
>    owner: root
>  /var/log:
>    perms: 1775
>    owner: yarn
>    group: mapred
>  /tmp/hadoop-yarn:
>    perms: 777
>    owner: mapred
>    group: mapred
>  /var/log/hadoop/yarn/apps:
>    perms: 1777
>    owner: yarn
>    group: mapred
>  /user/history:
>    perms: 755
>    owner: mapred
>    group: mapred
> hadoop::init_hdfs::users:
>  tom:
>    perms: 755
>  alice:
>    perms: 755
>  bigtop:
>    perms: 755
> 
> [2]
> ---
> root_user: 'hdfs'
> dirs:
>  '/tmp':
>    'perms': '1777'
>  '/tmp/hadoop-yarn':
>    'perms': '777'
>    'owner': 'mapred'
>    'group': 'mapred'
>  '/user':
>    'perms': '755'
>    'owner': 'hdfs'
>  '/user/history':
>    'perms': '755'
>    'owner': 'mapred'
>    'group': 'mapred'
>  '/user/root':
>    'perms': '777'
>    'owner': 'root'
>  '/var/log':
>    'perms': '1775'
>    'owner': 'yarn'
>    'group': 'mapred'
>  '/var/log/hadoop/yarn/apps':
>    'perms': '1777'
>    'owner': 'yarn'
>    'group': 'mapred'
> users:
>  'tom':
>    'perms': '755'
>  'alice':
>    'perms': '755'
>  'bigtop':
>    'perms': '755'

Reply via email to