Today, I will be playing the role of the fool/jester trying to get Myriad running. Basically, since getting Myriad running with Santosh quite a while ago, and now trying again with new versions of Hadoop, MapR, and Myriad, I wanted to hit up the wiki ( https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Home) and outline points that as a non-dev living the code, are unclear to someone trying to utilize myriad or understand it's operation.
Obviously, some of my points can be answered with "look here in the code" or look at this page, but I will try to outline my thought processes as I reviewed the current docs. Sometimes the way I approached the problem led me down a path of to a certain page, missing the answer in a different page, and thus some cross linking could be helpful. Please do not let my points be taken as anything other than a desire to improve how accessible Myriad is to the community, this is not a critique of the hard work everyone has done on the project. I also understand that given the work load and other issues, that fixing these issues in documentation may not be a priority. I am listing them out here, so that those folks who are SMEs on various points may be able to quickly add stuff and we'll organize it later. *Remote Distribution: * https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Remote+Distribution This whole section could use some work from a standpoint of what runs where and where that component gets its files. For example, I think it would help people to understand that the whole tarball created in step 6 has all the files for node managers and resource managers. Basically, everything runs from there. Here is a small example I am currently working with: Starting Myriad: Option 1: Use Marathon (provide example json, here is mine) { "cmd": "env && export YARN_RESOURCEMANAGER_OPTS=-Dyarn.resourcemanager.hostname=myriad.marathon.mesos && hadoop-2.7.0/bin/yarn resourcemanager", "uris": ["maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz"], "cpus": 1.0, "mem": 1024, "id": "myriad", "instances": 1, "user": "mapr" } In this case, Marathon grabs the hadoop tarball and pulls it down, this tarball also has the Myriad yml file. When it executes the resource manager, it is brought up in Myriad and ready to run node managers by pulling the tarball to the slave nodes and executing the nodemanager. (I would imagine the work with history server etc would also use this tarball?). >From here it will us NMInstances to launch a node manager. (Note, this is different from when I originally set things up... before, I could run the resource manager/myriad without a nodemanager, now it seems it's required based on the config in the src... could we expound on this in the docs somewhere?) Option 2: ???? (Are there other ways to launch the resource manager?) Step 6: So something that is unclear to me is the handling of the hadoop/yarn config files. In Step 6 on this page, there is "sudo rm hadoop- 2.5.0/etc/hadoop/*.xml" This doesn't makes sense to me. I actually ignored this step. For me, if I remove these xml files, then there is no place to get my files... I think? Since I am running the RM and NM from the same tarball, and Myriad config is here, and my goal is to not have anything installed on a node, where would I set yarn settings? This could be much clearer to me, and probably others. Step 2: Should we just be copying the Myriad files to /share/hadoop/yarn/lib folder? Do we worry about potentially overwrites of jars or version conflicts? *Configuring Cgroups* https://cwiki.apache.org/confluence/display/MYRIAD/Configuring+Cgroups At some point a little bit more about why one would want CGroups and issues that could occur with them. While many folks using Mesos/Myriad may understand this, others may not, and it's a good way to help people think positively about our project if we help educate them along the way. Minor point on enabling CGroups. This is confusing given my questions in remote distribution. in this it says I need to edit my yarn-site.xml, but in remote distribution it says delete my hadoop xml files. We need to address this conflict cause it can be confusing for a user coming onboard Nitpick: Enabling cgroups for mess-slave - should be - Enabling cgroups for mesos-slave *Myriad Configuration Properties*: https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Configuration+Properties Based on the conversation on list with Yuliya, "Currently, this file is built into Myriad Scheduler jar. So, if you need to modify some of the properties in this file, modify them before building Myriad Scheduler." isn't accurate any more, and we should address that. The configuration file in the wiki is an old one, the nmInstances isn't in it, (and see my question about that above). Frameworks and usernames. I think the users that the framework runs as, the actual node and resource managers, etc is confusing to a user (I am very confused!) When I first got Myriad up I set my user under the executor to be mapr, and then it appeared to work with impersonation from queries etc. Now, I am trying the remote distribution and I have users set in the config, potentially a user in my marathon json, and I am getting errors on permissions of files when a node manager tries to start (a separate issue I will post later). Basically, this is complex, and a page describing out what needs to run where with which permissions and how that interacts will be huge for people looking to put this into play. *Example Yarn Site:* https://cwiki.apache.org/confluence/display/MYRIAD/Example%3A+yarn-site.xml This is helpful, but where does it go? Remember, the remote distribution had us delete the yarn-site in the hadoop etc folder. *Myriad Webapp * https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Webapp This should be fleshed out a bit more. Also, it's in the /myriad-scheduler/src/main/resources/webapp based on my git clone, but in the wiki that's not listed. I had to dig for it. Some questions here: could the webapp be built during the myriad building process? Could it be then be packaged as tarball for execution either manually via marathon or automatically in a container on mesos? I understand this is a fresh piece of the puzzle, I am just thinking about and verbalizing the "where" on this for the future Those are the items that come to mind thus far. I hope the tone of my email is correct, this is a great project, and I want others to try it as I have. John Omernik