Thanks for the clear explanation, Swapnil. This sounds awesome. A much cleaner design, and it seems like exactly the kind of thing YARN AuxServices were created for.
On Thu, Jul 16, 2015 at 5:24 PM, Swapnil Daingade < [email protected]> wrote: > Hi All, > > Currently with Fine Grained Scheduling (FGS), the workflow for reporting > status and relinquishing resources used > by a YARN container is as following > > 1. The NodeManager reports the status/completion of the container to the > ResourceManager > as part of container statuses included in the NM to RM heartbeat > > 2. This container status is intercepted by the Myriad Scheduler. The > Scheduler sends a > frameworkMessage to the MyriadExecutor running on the NodeManager node. > See NMHeartBeatHandler.handleStatusUpdate here > > > https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112 > > 3. This frameworkMessage instructs the MyriadExecutor to report the task > state corresponding to the YARN container status back to mesos. > See MyriadExecutor.frameworkMessage here > > > https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252 > > There are some disadvantages to this approach > > 1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According > to the API documentation, this message is best effort. > /** > * Sends a message from the framework to one of its executors. These > * messages are best effort; do not expect a framework message to be > * retransmitted in any reliable fashion. > > 2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks > to be able to report statuses to Mesos Master. > If Scheduler/RM goes down, we will not be able to send task statuses > to Mesos, until the Scheduler/RM is back up. > This can lead to resource leakages. > > 3. There is additional overhead of sending messages back from Scheduler/RM > back to the Executors for each container on each > heartbeat. (Number of yarn containers/node * Number of Nodes) > additional messages. > > In order to avoid the above mentioned issues, we are proposing merging of > the MyriadExecutor and NodeManager. > The MyriadExecutor will run as a NM auxiliary service (same process as NM). > It will be able to intercept YARN container completion locally and inform > mesos-master irrespective of weather scheduler is running. > We will no longer have to use the sendFrameworkMessage method. > There will be less message traffic from scheduler to executor. > > I have posted my proposed changes as part of the pull request here > https://github.com/mesos/myriad/pull/118 > > Request you take a look and let me know your feedback. > > Regards > Swapnil >
