Today, I will be playing the role of the fool/jester trying to get Myriad
running. Basically, since getting Myriad running with Santosh quite a while
ago, and now trying again with new versions of Hadoop, MapR, and Myriad, I
wanted to hit up the wiki (
https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Home) and outline
points that as a non-dev living the code, are unclear to someone trying to
utilize myriad or understand it's operation.

Obviously, some of my points can be answered with "look here in the code"
or look at this page, but I will try to outline my thought processes as I
reviewed the current docs.  Sometimes the way I approached the problem led
me down a path of to a certain page, missing the answer in a different
page, and thus some cross linking could be helpful.

Please do not let my points be taken as anything other than a desire to
improve how accessible Myriad is to the community, this is not a critique
of the hard work everyone has done on the project.  I also understand that
given the work load and other issues, that fixing these issues in
documentation may not be a priority.  I am listing them out here, so that
those folks who are SMEs on various points may be able to quickly add stuff
and we'll organize it later.


*Remote Distribution: *
https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Remote+Distribution

This whole section could use some work from a standpoint of what runs where
and where that component gets its files.  For example, I think it would
help people to understand that the whole tarball created in step 6 has all
the files for node managers and resource managers.  Basically, everything
runs from there. Here is a small example I am currently working with:


Starting Myriad:
Option 1: Use Marathon (provide example json, here is mine)
{
"cmd": "env && export
YARN_RESOURCEMANAGER_OPTS=-Dyarn.resourcemanager.hostname=myriad.marathon.mesos
&& hadoop-2.7.0/bin/yarn resourcemanager",
"uris": ["maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz"],
"cpus": 1.0,
"mem": 1024,
"id": "myriad",
"instances": 1,
"user": "mapr"
}

In this case, Marathon grabs the hadoop tarball and pulls it down, this
tarball also has the Myriad yml file. When it executes the resource
manager, it is brought up in Myriad and ready to run node managers by
pulling the tarball to the slave nodes and executing the nodemanager.  (I
would imagine the work with history server etc would also use this
tarball?).

>From here it will us NMInstances to launch a node manager.  (Note, this is
different from when I originally set things up... before, I could run the
resource manager/myriad without a nodemanager, now it seems it's required
based on the config in the src... could we expound on this in the docs
somewhere?)


Option 2: ???? (Are there other ways to launch the resource manager?)

Step 6: So something that is unclear to me is  the handling of the
hadoop/yarn config files.  In Step 6 on this page, there is "sudo rm hadoop-
2.5.0/etc/hadoop/*.xml"  This doesn't makes sense to me. I actually ignored
this step.  For me, if I remove these xml files, then there is no place to
get my files... I think? Since I am running the RM and NM from the same
tarball, and Myriad config is here, and my goal is to not have anything
installed on a node, where would I set  yarn settings? This could be much
clearer to me, and probably others.

Step 2:  Should we just be copying the Myriad files to
/share/hadoop/yarn/lib folder? Do we worry about potentially overwrites of
jars or version conflicts?

*Configuring Cgroups*
https://cwiki.apache.org/confluence/display/MYRIAD/Configuring+Cgroups
At some point a little bit more about why one would want CGroups and issues
that could occur with them. While many folks using Mesos/Myriad may
understand this, others may not, and it's a good way to help people think
positively about our project if we help educate them along the way.

Minor point on enabling CGroups. This is confusing given my questions in
remote distribution. in this it says I need to edit my yarn-site.xml, but
in remote distribution it says delete my hadoop xml files. We need to
address this conflict cause it can be confusing for a user coming onboard

Nitpick: Enabling cgroups for mess-slave - should be - Enabling cgroups for
mesos-slave

*Myriad Configuration Properties*:
https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Configuration+Properties
Based on the conversation on list with Yuliya,
"Currently, this file is built into Myriad Scheduler jar. So, if you need
to modify some of the properties in this file, modify them before building
Myriad Scheduler."
isn't accurate any more, and we should address that.

The configuration file in the wiki is an old one, the nmInstances isn't in
it, (and see my question about that above).

Frameworks and usernames.   I think the users that the framework runs as,
the actual node and resource managers, etc is confusing to a user (I am
very confused!)  When I first got Myriad up I set my user under the
executor to be mapr, and then it appeared to work with impersonation from
queries etc.  Now, I am trying the remote distribution and I have users set
in the config, potentially a user in my marathon json, and I am getting
errors on permissions of files when a node manager tries to start (a
separate issue I will post later). Basically, this is complex, and a page
describing out what needs to run where with which permissions and how that
interacts will be huge for people looking to put this into play.

*Example Yarn Site:*
https://cwiki.apache.org/confluence/display/MYRIAD/Example%3A+yarn-site.xml

This is helpful, but where does it go?  Remember, the remote distribution
had us delete the yarn-site in the hadoop etc folder.

*Myriad Webapp *
 https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Webapp

This should be fleshed out a bit more.  Also, it's in the
/myriad-scheduler/src/main/resources/webapp based on my git clone, but in
the wiki that's not listed.  I had to dig for it.

Some questions here: could the webapp  be built during the myriad building
process? Could it be then be packaged as tarball for execution either
manually via marathon or automatically in a container on mesos?  I
understand this is a fresh piece of the puzzle, I am just thinking about
and verbalizing the "where" on this for the future



Those are the items that come to mind thus far.  I hope the tone of my
email is correct, this is a great project, and I want others to try it as I
have.

John Omernik

Reply via email to