[ https://issues.apache.org/jira/browse/BIGTOP-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355193#comment-14355193 ]
Richard Pelavin commented on BIGTOP-1746: ----------------------------------------- An approach to look at is having a simple topology DSL say in yaml that the end user interacts with to specify the logical set of nodes, what daemons are on each node and how they are interconnected (for more complex things like ha or indicating how monitors, user authentication ,etc could plug in).. It would be easy then to write some simple code to "compile this into for example" hiera.yaml files or as an ENC or even both. I think this begs the question as to what type of end user is expected to use this; if it is a Puppet savvy end user then having them specify things in hiera would be clear, but if going after end users that are not well versed in.Puppet having a Bigtop topology dsl" may resonate better and can be simpler. Now, I have done much work around this area so would be happy to propose a starting point for a topology DSL if this approach makes sense. I can also flesh out a number of issues that could be addressed to see what priority people would give it. One issue for example is whether the topology just logically identifies nodes and groups of nodes (e.g., the set of slaves) and does not require ip or dns addresses to be assigned to them; this allows more sharable designs without locking users into what would be one team's deployment specific settings. It also facilitates the process where given a toplogy we spin up a set of nodes and in a late binding way attach the host addresses to the nodes and to the attributes for connecting between hosts. For the specific example of init-hdfs,sh if I am correctly guessing at what issue would be is that ideally want to only create directories for services that are being actually used and not create it for all directories or equivalently for the data driven way in which this is created you want to construct the description of directories to include as a function of what deamons are on the topology nodes. This is something I have tackled and can include this in a write up if interested, > Introduce the concept of roles in bigtop cluster deployment > ----------------------------------------------------------- > > Key: BIGTOP-1746 > URL: https://issues.apache.org/jira/browse/BIGTOP-1746 > Project: Bigtop > Issue Type: New Feature > Components: deployment > Reporter: vishnu gajendran > Labels: features > Fix For: 0.9.0 > > > Currently, during cluster deployment, puppet categorizes nodes as head_node, > worker_nodes, gateway_nodes, standy_node based on user specified info. This > functionality gives user control over picking up a particular node as > head_node, standy_node, gateway_node and rest others as worker_nodes. But, I > woulld like to have more fine-grained control on which deamons should run on > which node. For example, I do not want to run namenode, datanode on the same > node. This functionality can be introduced with the concept of roles. Each > node can be assigned a set of role. For example, Node A can be assigned > ["namenode", "resourcemanager"] roles. Node B can be assigned ["datanode", > "nodemanager"] and Node C can be assigned ["nodemanager", "hadoop-client"]. > Now, each node will only run the specified daemons. Prerequisite for this > kind of deployment is that each node should be given the necessary > configurations that it needs to know. For example, each datanode should know > which is the namenode etc... This functionality will allow users to customize > the cluster deployment according to their needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)