Sounds good to me: the roles are definitely the way to go. 1) I suggest keeping json , but won't turn down a patch that updates to yaml so long as docs are fixed accordingly as well to. This might not be worth it though.
2) Init-Hdfs.sh should overtime be deleted and replaced with pure hcfs implementation, so maybe we can do that as part of this work? > On Sep 14, 2015, at 8:37 PM, Jonathan Kelly <[email protected]> wrote: > > Hey, all, > > I'm working on some improvements to init-hcfs.groovy hiin order to make > init-hcfs.json more maintainable and also add the ability to make it more > configurable (i.e., only create directories for the apps that are > installed, if using the newly merged roles feature), but before I get too > far I'd like to run my ideas by the community. > > Here is how I'm thinking it will work: > > 1. In the site.yaml, you'll specify some new properties that look > something like [1]. > - Alternatively, rather than having to specify the directories via > site.yaml, maybe we can define a new resource type representing an HDFS > directory and declare all required HDFS directories for each app in their > own manifest files. (I think this is the spirit behind > https://issues.apache.org/jira/browse/BIGTOP-1772?) But in order to > do this we'd need some Puppet magic that can aggregate all of these > resources and only call init-hdfs once, passing in all of these > resources. > Is this even possible? I know resource collectors can be used to > order all > of these "hdfs_dir" resources before the "init_hdfs" exec resource, but I > don't know how you'd make the "init_hdfs" exec resource operate on all of > the collected "hdfs_dir" resources, if I'm making any sense. > 2. The hadoop::init_hdfs Puppet class will write out a file called > /var/lib/hadoop-hdfs/init-hcfs.yaml that looks something like [2] (very > similar to [1] of course). > - Note that I think it might be best to use YAML instead of JSON, > since YAML files are much easier to write out using a template. On the > other hand, Groovy doesn't have built-in support for YAML like it has for > JSON, so it might be more difficult to modify init-hcfs.groovy to consume > this YAML file, or at least I'd need to add SnakeYAML as a > dependency. Any > thoughts on this? > - If I do add a dependency on SnakeYAML, it would probably also be > good to implement https://issues.apache.org/jira/browse/BIGTOP-1871 > along with this change so that init-hcfs.groovy can have its own install > location rather than being part of the hadoop-hdfs package. > 3. When calling init-hdfs.sh, the hadoop::init_hdfs Puppet class will > pass it this newly generated /var/lib/hadoop-hdfs/init-hcfs.yaml file, > which it will pass through to init-hcfs.groovy. > 4. Finally, init-hcfs.groovy will be changed to read from this YAML file > in the format below instead of the hardcoded JSON file we have been using, > assuming the community is OK with this file format change. > > Thanks, > Jonathan Kelly > > [1] > hadoop::init_hdfs::hdfs_root_user: hdfs > hadoop::init_hdfs::dirs: > /tmp: > perms: 1777 > /user: > perms: 755 > owner: ${hadoop::init_hdfs::hdfs_root_user} > /user/root: > perms: 777 > owner: root > /var/log: > perms: 1775 > owner: yarn > group: mapred > /tmp/hadoop-yarn: > perms: 777 > owner: mapred > group: mapred > /var/log/hadoop/yarn/apps: > perms: 1777 > owner: yarn > group: mapred > /user/history: > perms: 755 > owner: mapred > group: mapred > hadoop::init_hdfs::users: > tom: > perms: 755 > alice: > perms: 755 > bigtop: > perms: 755 > > [2] > --- > root_user: 'hdfs' > dirs: > '/tmp': > 'perms': '1777' > '/tmp/hadoop-yarn': > 'perms': '777' > 'owner': 'mapred' > 'group': 'mapred' > '/user': > 'perms': '755' > 'owner': 'hdfs' > '/user/history': > 'perms': '755' > 'owner': 'mapred' > 'group': 'mapred' > '/user/root': > 'perms': '777' > 'owner': 'root' > '/var/log': > 'perms': '1775' > 'owner': 'yarn' > 'group': 'mapred' > '/var/log/hadoop/yarn/apps': > 'perms': '1777' > 'owner': 'yarn' > 'group': 'mapred' > users: > 'tom': > 'perms': '755' > 'alice': > 'perms': '755' > 'bigtop': > 'perms': '755'
