Re: Proposed init-hcfs improvements

Jonathan Kelly Thu, 17 Sep 2015 12:28:03 -0700

Thanks for the response, Jay.

Though a YAML file would look a bit nicer, I decided that it probably is
better to keep it as a JSON file because I really did not want to add a
dependency on SnakeYAML to init-hcfs.groovy. Also, one of my main reasons
for changing it to a YAML file was because I was generating it in a new
Puppet template (init-hcfs.yaml.tmpl), and it seemed like it was going to
be really difficult to write out a valid JSON file one element/field at a
time because it would need to put commas in all the correct places, etc.
However, I realized that I could just write some Ruby code in the template
that transforms the config from site.yaml into the expected format then use
JSON.pretty_generate() to write it all out in one line. So the template
(now called init-hcfs.json.tmpl) is actually pretty clean looking.


~ Jonathan

On Mon, Sep 14, 2015 at 8:00 PM, Jay Vyas <[email protected]>
wrote:

> Sounds good to me: the roles are definitely the way to go.
>
> 1) I suggest keeping json , but won't turn down a patch that updates to
> yaml so long as docs are fixed accordingly as well to. This might not be
> worth it though.
>
> 2) Init-Hdfs.sh should overtime be deleted and replaced with pure hcfs
> implementation, so maybe we can do that as part of this work?
>
> > On Sep 14, 2015, at 8:37 PM, Jonathan Kelly <[email protected]>
> wrote:
> >
> > Hey, all,
> >
> > I'm working on some improvements to init-hcfs.groovy hiin order to make
> > init-hcfs.json more maintainable and also add the ability to make it more
> > configurable (i.e., only create directories for the apps that are
> > installed, if using the newly merged roles feature), but before I get too
> > far I'd like to run my ideas by the community.
> >
> > Here is how I'm thinking it will work:
> >
> >   1. In the site.yaml, you'll specify some new properties that look
> >   something like [1].
> >      - Alternatively, rather than having to specify the directories via
> >      site.yaml, maybe we can define a new resource type representing an
> HDFS
> >      directory and declare all required HDFS directories for each app in
> their
> >      own manifest files. (I think this is the spirit behind
> >      https://issues.apache.org/jira/browse/BIGTOP-1772?) But in order to
> >      do this we'd need some Puppet magic that can aggregate all of these
> >      resources and only call init-hdfs once, passing in all of these
> > resources.
> >      Is this even possible? I know resource collectors can be used to
> > order all
> >      of these "hdfs_dir" resources before the "init_hdfs" exec resource,
> but I
> >      don't know how you'd make the "init_hdfs" exec resource operate on
> all of
> >      the collected "hdfs_dir" resources, if I'm making any sense.
> >   2. The hadoop::init_hdfs Puppet class will write out a file called
> >   /var/lib/hadoop-hdfs/init-hcfs.yaml that looks something like [2] (very
> >   similar to [1] of course).
> >      - Note that I think it might be best to use YAML instead of JSON,
> >      since YAML files are much easier to write out using a template. On
> the
> >      other hand, Groovy doesn't have built-in support for YAML like it
> has for
> >      JSON, so it might be more difficult to modify init-hcfs.groovy to
> consume
> >      this YAML file, or at least I'd need to add SnakeYAML as a
> > dependency. Any
> >      thoughts on this?
> >      - If I do add a dependency on SnakeYAML, it would probably also be
> >      good to implement https://issues.apache.org/jira/browse/BIGTOP-1871
> >      along with this change so that init-hcfs.groovy can have its own
> install
> >      location rather than being part of the hadoop-hdfs package.
> >   3. When calling init-hdfs.sh, the hadoop::init_hdfs Puppet class will
> >   pass it this newly generated /var/lib/hadoop-hdfs/init-hcfs.yaml file,
> >   which it will pass through to init-hcfs.groovy.
> >   4. Finally, init-hcfs.groovy will be changed to read from this YAML
> file
> >   in the format below instead of the hardcoded JSON file we have been
> using,
> >   assuming the community is OK with this file format change.
> >
> > Thanks,
> > Jonathan Kelly
> >
> > [1]
> > hadoop::init_hdfs::hdfs_root_user: hdfs
> > hadoop::init_hdfs::dirs:
> >  /tmp:
> >    perms: 1777
> >  /user:
> >    perms: 755
> >    owner: ${hadoop::init_hdfs::hdfs_root_user}
> >  /user/root:
> >    perms: 777
> >    owner: root
> >  /var/log:
> >    perms: 1775
> >    owner: yarn
> >    group: mapred
> >  /tmp/hadoop-yarn:
> >    perms: 777
> >    owner: mapred
> >    group: mapred
> >  /var/log/hadoop/yarn/apps:
> >    perms: 1777
> >    owner: yarn
> >    group: mapred
> >  /user/history:
> >    perms: 755
> >    owner: mapred
> >    group: mapred
> > hadoop::init_hdfs::users:
> >  tom:
> >    perms: 755
> >  alice:
> >    perms: 755
> >  bigtop:
> >    perms: 755
> >
> > [2]
> > ---
> > root_user: 'hdfs'
> > dirs:
> >  '/tmp':
> >    'perms': '1777'
> >  '/tmp/hadoop-yarn':
> >    'perms': '777'
> >    'owner': 'mapred'
> >    'group': 'mapred'
> >  '/user':
> >    'perms': '755'
> >    'owner': 'hdfs'
> >  '/user/history':
> >    'perms': '755'
> >    'owner': 'mapred'
> >    'group': 'mapred'
> >  '/user/root':
> >    'perms': '777'
> >    'owner': 'root'
> >  '/var/log':
> >    'perms': '1775'
> >    'owner': 'yarn'
> >    'group': 'mapred'
> >  '/var/log/hadoop/yarn/apps':
> >    'perms': '1777'
> >    'owner': 'yarn'
> >    'group': 'mapred'
> > users:
> >  'tom':
> >    'perms': '755'
> >  'alice':
> >    'perms': '755'
> >  'bigtop':
> >    'perms': '755'
>

Re: Proposed init-hcfs improvements

Reply via email to