Hi Darin, I had to add the following in the yarn-site.xml on NMs.
<property> <description>A comma separated list of services where service name should only contain a-zA-Z0-9_ and can not start with numbers</description> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle, myriad_executor</value> </property> <property> <name>yarn.nodemanager.aux-services.myriad_executor.class</name> <value>com.ebay.myriad.executor.MyriadExecutorAuxService</value> </property> and I had to extract the contents of myriad-executor-runnable-0.0.1.jar into $YARN_HOME/share/hadoop/yarn/lib to get it working. Please let me know if you face any issues or any other feedback you may have. Regards Swapnil On Fri, Jul 17, 2015 at 8:08 AM, Swapnil Daingade < swapnil.daing...@gmail.com> wrote: > Should work with 2.5 as earlier (I tested on single node mapr cluster with > hadoop-common 2.5.1). > > Only change I did was to copy the myriad executor jars under > $PROJECT_HOME/myriad-executor/build/libs/ > to $YARN_HOME/share/hadoop/yarn/lib > as the MyriadExecutor now runs as part of NM. > > Actually let me spin up a NM only node and make sure all the dependencies > for MyriadExecutor are satisfied (I tested on a node that had both RM and > NM). > > Regards > Swapnil > > > On Fri, Jul 17, 2015 at 5:27 AM, Darin Johnson <dbjohnson1...@gmail.com> > wrote: > >> Awesome took a quick look, will test and go over code soon. Any >> dependency >> issues between 2.5, 2.6 and 2.7 to be aware of? >> Hi All, >> >> Currently with Fine Grained Scheduling (FGS), the workflow for reporting >> status and relinquishing resources used >> by a YARN container is as following >> >> 1. The NodeManager reports the status/completion of the container to the >> ResourceManager >> as part of container statuses included in the NM to RM heartbeat >> >> 2. This container status is intercepted by the Myriad Scheduler. The >> Scheduler sends a >> frameworkMessage to the MyriadExecutor running on the NodeManager >> node. >> See NMHeartBeatHandler.handleStatusUpdate here >> >> >> https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112 >> >> 3. This frameworkMessage instructs the MyriadExecutor to report the task >> state corresponding to the YARN container status back to mesos. >> See MyriadExecutor.frameworkMessage here >> >> >> https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252 >> >> There are some disadvantages to this approach >> >> 1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According >> to the API documentation, this message is best effort. >> /** >> * Sends a message from the framework to one of its executors. These >> * messages are best effort; do not expect a framework message to be >> * retransmitted in any reliable fashion. >> >> 2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks >> to be able to report statuses to Mesos Master. >> If Scheduler/RM goes down, we will not be able to send task statuses >> to Mesos, until the Scheduler/RM is back up. >> This can lead to resource leakages. >> >> 3. There is additional overhead of sending messages back from Scheduler/RM >> back to the Executors for each container on each >> heartbeat. (Number of yarn containers/node * Number of Nodes) >> additional messages. >> >> In order to avoid the above mentioned issues, we are proposing merging of >> the MyriadExecutor and NodeManager. >> The MyriadExecutor will run as a NM auxiliary service (same process as >> NM). >> It will be able to intercept YARN container completion locally and inform >> mesos-master irrespective of weather scheduler is running. >> We will no longer have to use the sendFrameworkMessage method. >> There will be less message traffic from scheduler to executor. >> >> I have posted my proposed changes as part of the pull request here >> https://github.com/mesos/myriad/pull/118 >> >> Request you take a look and let me know your feedback. >> >> Regards >> Swapnil >> > >