Re: Merging MyriadExecutor with NodeManager

Swapnil Daingade Fri, 17 Jul 2015 08:09:33 -0700

Should work with 2.5 as earlier (I tested on single node mapr cluster with
hadoop-common 2.5.1).


Only change I did was to copy the myriad executor jars under
$PROJECT_HOME/myriad-executor/build/libs/
to $YARN_HOME/share/hadoop/yarn/lib
as the MyriadExecutor now runs as part of NM.

Actually let me spin up a NM only node and make sure all the dependencies
for MyriadExecutor are satisfied (I tested on a node that had both RM and
NM).

Regards
Swapnil


On Fri, Jul 17, 2015 at 5:27 AM, Darin Johnson <[email protected]>
wrote:

> Awesome took a quick look, will test and go over code soon.  Any dependency
> issues between 2.5, 2.6 and 2.7 to be aware of?
> Hi All,
>
> Currently with Fine Grained Scheduling (FGS), the workflow for reporting
> status and relinquishing resources used
> by a YARN container is as following
>
> 1. The NodeManager reports the status/completion of the container to the
> ResourceManager
>      as part of container statuses included in the NM to RM heartbeat
>
> 2. This container status is intercepted by the Myriad Scheduler. The
> Scheduler sends a
>     frameworkMessage to the MyriadExecutor running on the NodeManager node.
>     See NMHeartBeatHandler.handleStatusUpdate here
>
>
> https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112
>
> 3. This frameworkMessage instructs the MyriadExecutor to report the task
> state corresponding to the YARN container status back to mesos.
>      See MyriadExecutor.frameworkMessage here
>
>
> https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252
>
> There are some disadvantages to this approach
>
> 1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
> to the API documentation, this message is best effort.
>   /**
>     * Sends a message from the framework to one of its executors. These
>     * messages are best effort; do not expect a framework message to be
>     * retransmitted in any reliable fashion.
>
> 2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
> to be able to report statuses to Mesos Master.
>      If Scheduler/RM goes down, we will not be able to send task statuses
> to Mesos, until the Scheduler/RM is back up.
>      This can lead to resource leakages.
>
> 3. There is additional overhead of sending messages back from Scheduler/RM
> back to the Executors for each container on each
>      heartbeat. (Number of yarn containers/node * Number of Nodes)
> additional messages.
>
> In order to avoid the above mentioned issues, we are proposing merging of
> the MyriadExecutor and NodeManager.
> The MyriadExecutor will run as a NM auxiliary service (same process as NM).
> It will be able to intercept YARN container completion locally and inform
> mesos-master irrespective of weather scheduler is running.
> We will no longer have to use the sendFrameworkMessage method.
> There will be less message traffic from scheduler to executor.
>
> I have posted my proposed changes as part of the pull request here
> https://github.com/mesos/myriad/pull/118
>
> Request you take a look and let me know your feedback.
>
> Regards
> Swapnil
>

Re: Merging MyriadExecutor with NodeManager

Reply via email to