Hi All,

Currently with Fine Grained Scheduling (FGS), the workflow for reporting
status and relinquishing resources used
by a YARN container is as following

1. The NodeManager reports the status/completion of the container to the
ResourceManager
     as part of container statuses included in the NM to RM heartbeat

2. This container status is intercepted by the Myriad Scheduler. The
Scheduler sends a
    frameworkMessage to the MyriadExecutor running on the NodeManager node.
    See NMHeartBeatHandler.handleStatusUpdate here

https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112

3. This frameworkMessage instructs the MyriadExecutor to report the task
state corresponding to the YARN container status back to mesos.
     See MyriadExecutor.frameworkMessage here

https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252

There are some disadvantages to this approach

1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
to the API documentation, this message is best effort.
  /**
    * Sends a message from the framework to one of its executors. These
    * messages are best effort; do not expect a framework message to be
    * retransmitted in any reliable fashion.

2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
to be able to report statuses to Mesos Master.
     If Scheduler/RM goes down, we will not be able to send task statuses
to Mesos, until the Scheduler/RM is back up.
     This can lead to resource leakages.

3. There is additional overhead of sending messages back from Scheduler/RM
back to the Executors for each container on each
     heartbeat. (Number of yarn containers/node * Number of Nodes)
additional messages.

In order to avoid the above mentioned issues, we are proposing merging of
the MyriadExecutor and NodeManager.
The MyriadExecutor will run as a NM auxiliary service (same process as NM).
It will be able to intercept YARN container completion locally and inform
mesos-master irrespective of weather scheduler is running.
We will no longer have to use the sendFrameworkMessage method.
There will be less message traffic from scheduler to executor.

I have posted my proposed changes as part of the pull request here
https://github.com/mesos/myriad/pull/118

Request you take a look and let me know your feedback.

Regards
Swapnil

Reply via email to