Thanks for the response, Jay. Though a YAML file would look a bit nicer, I decided that it probably is better to keep it as a JSON file because I really did not want to add a dependency on SnakeYAML to init-hcfs.groovy. Also, one of my main reasons for changing it to a YAML file was because I was generating it in a new Puppet template (init-hcfs.yaml.tmpl), and it seemed like it was going to be really difficult to write out a valid JSON file one element/field at a time because it would need to put commas in all the correct places, etc. However, I realized that I could just write some Ruby code in the template that transforms the config from site.yaml into the expected format then use JSON.pretty_generate() to write it all out in one line. So the template (now called init-hcfs.json.tmpl) is actually pretty clean looking.
~ Jonathan On Mon, Sep 14, 2015 at 8:00 PM, Jay Vyas <[email protected]> wrote: > Sounds good to me: the roles are definitely the way to go. > > 1) I suggest keeping json , but won't turn down a patch that updates to > yaml so long as docs are fixed accordingly as well to. This might not be > worth it though. > > 2) Init-Hdfs.sh should overtime be deleted and replaced with pure hcfs > implementation, so maybe we can do that as part of this work? > > > On Sep 14, 2015, at 8:37 PM, Jonathan Kelly <[email protected]> > wrote: > > > > Hey, all, > > > > I'm working on some improvements to init-hcfs.groovy hiin order to make > > init-hcfs.json more maintainable and also add the ability to make it more > > configurable (i.e., only create directories for the apps that are > > installed, if using the newly merged roles feature), but before I get too > > far I'd like to run my ideas by the community. > > > > Here is how I'm thinking it will work: > > > > 1. In the site.yaml, you'll specify some new properties that look > > something like [1]. > > - Alternatively, rather than having to specify the directories via > > site.yaml, maybe we can define a new resource type representing an > HDFS > > directory and declare all required HDFS directories for each app in > their > > own manifest files. (I think this is the spirit behind > > https://issues.apache.org/jira/browse/BIGTOP-1772?) But in order to > > do this we'd need some Puppet magic that can aggregate all of these > > resources and only call init-hdfs once, passing in all of these > > resources. > > Is this even possible? I know resource collectors can be used to > > order all > > of these "hdfs_dir" resources before the "init_hdfs" exec resource, > but I > > don't know how you'd make the "init_hdfs" exec resource operate on > all of > > the collected "hdfs_dir" resources, if I'm making any sense. > > 2. The hadoop::init_hdfs Puppet class will write out a file called > > /var/lib/hadoop-hdfs/init-hcfs.yaml that looks something like [2] (very > > similar to [1] of course). > > - Note that I think it might be best to use YAML instead of JSON, > > since YAML files are much easier to write out using a template. On > the > > other hand, Groovy doesn't have built-in support for YAML like it > has for > > JSON, so it might be more difficult to modify init-hcfs.groovy to > consume > > this YAML file, or at least I'd need to add SnakeYAML as a > > dependency. Any > > thoughts on this? > > - If I do add a dependency on SnakeYAML, it would probably also be > > good to implement https://issues.apache.org/jira/browse/BIGTOP-1871 > > along with this change so that init-hcfs.groovy can have its own > install > > location rather than being part of the hadoop-hdfs package. > > 3. When calling init-hdfs.sh, the hadoop::init_hdfs Puppet class will > > pass it this newly generated /var/lib/hadoop-hdfs/init-hcfs.yaml file, > > which it will pass through to init-hcfs.groovy. > > 4. Finally, init-hcfs.groovy will be changed to read from this YAML > file > > in the format below instead of the hardcoded JSON file we have been > using, > > assuming the community is OK with this file format change. > > > > Thanks, > > Jonathan Kelly > > > > [1] > > hadoop::init_hdfs::hdfs_root_user: hdfs > > hadoop::init_hdfs::dirs: > > /tmp: > > perms: 1777 > > /user: > > perms: 755 > > owner: ${hadoop::init_hdfs::hdfs_root_user} > > /user/root: > > perms: 777 > > owner: root > > /var/log: > > perms: 1775 > > owner: yarn > > group: mapred > > /tmp/hadoop-yarn: > > perms: 777 > > owner: mapred > > group: mapred > > /var/log/hadoop/yarn/apps: > > perms: 1777 > > owner: yarn > > group: mapred > > /user/history: > > perms: 755 > > owner: mapred > > group: mapred > > hadoop::init_hdfs::users: > > tom: > > perms: 755 > > alice: > > perms: 755 > > bigtop: > > perms: 755 > > > > [2] > > --- > > root_user: 'hdfs' > > dirs: > > '/tmp': > > 'perms': '1777' > > '/tmp/hadoop-yarn': > > 'perms': '777' > > 'owner': 'mapred' > > 'group': 'mapred' > > '/user': > > 'perms': '755' > > 'owner': 'hdfs' > > '/user/history': > > 'perms': '755' > > 'owner': 'mapred' > > 'group': 'mapred' > > '/user/root': > > 'perms': '777' > > 'owner': 'root' > > '/var/log': > > 'perms': '1775' > > 'owner': 'yarn' > > 'group': 'mapred' > > '/var/log/hadoop/yarn/apps': > > 'perms': '1777' > > 'owner': 'yarn' > > 'group': 'mapred' > > users: > > 'tom': > > 'perms': '755' > > 'alice': > > 'perms': '755' > > 'bigtop': > > 'perms': '755' >
