I promised I'd provide a marathon json for this, I'll update the wiki with a more complete one later next week, these are retyped by hand and may have erros. But here's the idea:
{ "id": "resource-manager", "uris": ["hdfs://namenode:port/dist/hadoop-2.7.0.tgz", "hdfs://namenode:port/dist/conf/hadoop/yarn-site.xml", "hdfs://namenode:port/dist/conf/hadoop/hdfs-site.xml", "hdfs:///dist/conf/hadoop/core-site.xml", "hdfs://namenode:port/dist/conf/hadoop/mapred-site.xml"], "cmd": "cp *.xml hadoop-2.7.0/etc/hadoop && cd hadoop-2.7.0 && bin/yarn resourcemanager", "mem": 16, "instances" : 1, "user": "yarn" } This has a disadvantage of not having dynamic ports and getting specific ports is really hit or miss as I haven't found a good way to protect them from other frameworks on the same role (other than inventing a new resource). So I use ports outside the range I give mesos. I do use mesos-dns and set the yarn.resourcemanager.hostname to resource-manager.marathon.mesos. One of the reasons I like the option of getting the configs from the resource manager is I'm experimenting with: { "id": "resource-manager", "uris": ["hdfs://namenode:port/dist/hadoop-2.7.0.tgz", "hdfs://namenode:port/dist/conf/hadoop/yarn-site.xml", "hdfs://namenode:port/dist/conf/hadoop/hdfs-site.xml", "hdfs:///dist/conf/hadoop/core-site.xml", "hdfs://namenode:port/dist/conf/hadoop/mapred-site.xml"], "ports": [0,0,0] "cmd": "cp *.xml hadoop-2.7.0/etc/hadoop && cd hadoop-2.7.0 && bin/yarn resourcemanager", "env": { "YARN_OPTS": "-D yarn.resourcemanager.webapp.address=yarn.resourcemanager.hostname:$PORT0 -Dyarn.resourcemanager.resource-tracker.address= yarn.resourcemanager.hostname:$PORT1 -Dyarn.resourcemanager.admin.address=" yarn.resourcemanager.hostname:$PORT2" }, "mem": 16, "instances" : 1, "user": "yarn" } I did a similar type of thing with the hadoop 1 and storm frameworks and it works pretty well but there are some challenges unique to myriad on this. Since the NM config is pulled from yarn.resourcemanager.webapp.address, everything gets configured correctly. However, as there's no way presently to update the NMs config post start up in case of change (In the hadoop and storm world theres no High Availability so those tasks die, making this a non issue). A possible solution is HA Proxy and that's probably OK for the amount of data going from NMs to RM but annoying. I'm open to suggestions on this as this seems kind of tricky without an external service discovery tool or extending the NodeManager class. There's also trying to deal with two RMs in this context for HA, I believe I might be able to implement some zookeeper logic for the initial sync but no idea on updates if an RM moves other than HA proxy or the like. Again open to suggestions.