Hi all,

I have sent a pull request for Myriad HA state store implementation here
https://github.com/mesos/myriad/pull/123
* I want to mention that the design for Myriad HA has been strongly
influenced by Paul Read's
work here https://github.com/pdread100/myriad/tree/issue-13 and here
https://github.com/pdread100/myriad/tree/issue-15

* I have used Paul's code to serialize and deserialize the scheduler state
to the state store (see commit 1)
  I have made minor additions for the frameworkId to be stored and
retrieved.
  I have made sure to commit this code with Paul as the author.

* The pull request stores the Myriad Scheduler state to the DFS
* To use the state store implementation you need to add the following
properties to the yarn-site.xml
  on the RM.

  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.MyriadFileSystemRMStateStore</value>
  </property>
  <property>
   <name>yarn.resourcemanager.fs.state-store.uri</name>
   <value>/var/mapr/cluster/yarn/rm/system</value> <!-- Replace this to
desired path -->
  </property>

You should be able to see a directory structure similar to this on the dfs

hadoop fs -ls /var/mapr/cluster/yarn/rm/system/FSRMStateRoot

Found 4 items
drwxr-xr-x   - mapr mapr          5 2015-07-23 13:40
/var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot
drwxr-xr-x   - mapr mapr         65 2015-08-02 17:19
/var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMDTSecretManagerRoot
drwxr-xr-x   - mapr mapr          1 2015-07-27 17:21
/var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMMyriadRoot <--- Myriad
state root folder
-rwxr-xr-x   3 mapr mapr          4 2015-07-21 10:37
/var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMVersionNode

hadoop fs -ls /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMMyriadRoot

Found 1 items
-rwxr-xr-x   3 mapr mapr         80 2015-07-27 17:21
/var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMMyriadRoot/MyriadState <--
Myriad state file

* This pull request does not do the following (work in progress)
  1. Reconcile state with Mesos Master and restart NMs if they are lost
during Myriad scheduler restart.
  2. In case of FGS, update the RM's view of NM resources for NMs running
containers.
* Detailed design doc for Myriad HA is here
https://docs.google.com/document/d/1BkcDChhOLU5TDU6ZQEpIh-WBKoCwYPPi9OV-__mQlmQ/edit?usp=sharing

Please let me know your thoughts, suggestions etc.

Regards
Swapnil

Reply via email to