I promised I'd provide a marathon json for this, I'll update the wiki with
a more complete one later next week, these are retyped by hand and may have
erros.  But here's the idea:

{
  "id": "resource-manager",
  "uris": ["hdfs://namenode:port/dist/hadoop-2.7.0.tgz",
             "hdfs://namenode:port/dist/conf/hadoop/yarn-site.xml",
             "hdfs://namenode:port/dist/conf/hadoop/hdfs-site.xml",
             "hdfs:///dist/conf/hadoop/core-site.xml",
             "hdfs://namenode:port/dist/conf/hadoop/mapred-site.xml"],
  "cmd": "cp *.xml hadoop-2.7.0/etc/hadoop && cd hadoop-2.7.0 && bin/yarn
resourcemanager",
  "mem": 16,
  "instances" : 1,
  "user": "yarn"
}

This has a disadvantage of not having dynamic ports and getting specific
ports is really hit or miss as I haven't found a good way to protect them
from other frameworks on the same role (other than inventing a new
resource).  So I use ports outside the range I give mesos.  I do use
mesos-dns and set the yarn.resourcemanager.hostname to
resource-manager.marathon.mesos.

One of the reasons I like the option of getting the configs from the
resource manager is I'm experimenting with:

{
  "id": "resource-manager",
  "uris": ["hdfs://namenode:port/dist/hadoop-2.7.0.tgz",
             "hdfs://namenode:port/dist/conf/hadoop/yarn-site.xml",
             "hdfs://namenode:port/dist/conf/hadoop/hdfs-site.xml",
             "hdfs:///dist/conf/hadoop/core-site.xml",
             "hdfs://namenode:port/dist/conf/hadoop/mapred-site.xml"],
  "ports": [0,0,0]
  "cmd": "cp *.xml hadoop-2.7.0/etc/hadoop && cd hadoop-2.7.0 &&  bin/yarn
resourcemanager",
  "env": {
    "YARN_OPTS": "-D
yarn.resourcemanager.webapp.address=yarn.resourcemanager.hostname:$PORT0
                      -Dyarn.resourcemanager.resource-tracker.address=
yarn.resourcemanager.hostname:$PORT1
                      -Dyarn.resourcemanager.admin.address="
yarn.resourcemanager.hostname:$PORT2"
  },
  "mem": 16,
  "instances" : 1,
  "user": "yarn"
}

I did a similar type of thing with the hadoop 1 and storm frameworks and it
works pretty well but there are some challenges unique to myriad on this.

Since the NM config is pulled from yarn.resourcemanager.webapp.address,
everything gets configured correctly.  However, as there's no way presently
to update the NMs config post start up in case of change (In the hadoop and
storm world theres no High Availability so those tasks die, making this a
non issue).  A possible solution is HA Proxy and that's probably OK for the
amount of data going from NMs to RM but annoying.  I'm open to suggestions
on this as this seems kind of tricky without an external service discovery
tool or extending the NodeManager class.

There's also trying to deal with two RMs in this context for HA, I believe
I might be able to implement some zookeeper logic for the initial sync but
no idea on updates if an RM moves other than HA proxy or the like.  Again
open to suggestions.

Reply via email to