[jira] [Resolved] (MYRIAD-118) Changes for merging MyriadExecutor with NodeManager.

2015-10-14 Thread Santosh Marella (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santosh Marella resolved MYRIAD-118.

Resolution: Fixed

Fixed in https://github.com/mesos/myriad/pull/123

Great work [~sdaingade] and [~pdread100]

> Changes for merging MyriadExecutor with NodeManager.
> 
>
> Key: MYRIAD-118
> URL: https://issues.apache.org/jira/browse/MYRIAD-118
> Project: Myriad
>  Issue Type: Bug
>Reporter: Swapnil Daingade
>Assignee: Swapnil Daingade
> Fix For: Myriad 0.1.0
>
>
> * NM command line generation is now done on the Myriad scheduler side.
> * Add two classes that generate NM command line.
> **DownloadNMExecutorCLGenImpl(downloads binaries)
> **NMExecutorCLGenImpl(assumes binaries already present)
> * Myriad Executor now runs as a YARN auxillary service
> Tested by doing a flex up and running a YARN Teragen job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MYRIAD-118) Changes for merging MyriadExecutor with NodeManager.

2015-10-14 Thread Santosh Marella (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santosh Marella updated MYRIAD-118:
---
Fix Version/s: Myriad 0.1.0

> Changes for merging MyriadExecutor with NodeManager.
> 
>
> Key: MYRIAD-118
> URL: https://issues.apache.org/jira/browse/MYRIAD-118
> Project: Myriad
>  Issue Type: Bug
>Reporter: Swapnil Daingade
>Assignee: Swapnil Daingade
> Fix For: Myriad 0.1.0
>
>
> * NM command line generation is now done on the Myriad scheduler side.
> * Add two classes that generate NM command line.
> **DownloadNMExecutorCLGenImpl(downloads binaries)
> **NMExecutorCLGenImpl(assumes binaries already present)
> * Myriad Executor now runs as a YARN auxillary service
> Tested by doing a flex up and running a YARN Teragen job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MYRIAD-118) Changes for merging MyriadExecutor with NodeManager.

2015-08-20 Thread Swapnil Daingade (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swapnil Daingade reassigned MYRIAD-118:
---

Assignee: Swapnil Daingade

 Changes for merging MyriadExecutor with NodeManager.
 

 Key: MYRIAD-118
 URL: https://issues.apache.org/jira/browse/MYRIAD-118
 Project: Myriad
  Issue Type: Bug
Reporter: Swapnil Daingade
Assignee: Swapnil Daingade

 * NM command line generation is now done on the Myriad scheduler side.
 * Add two classes that generate NM command line.
 **DownloadNMExecutorCLGenImpl(downloads binaries)
 **NMExecutorCLGenImpl(assumes binaries already present)
 * Myriad Executor now runs as a YARN auxillary service
 Tested by doing a flex up and running a YARN Teragen job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Merging MyriadExecutor with NodeManager

2015-07-17 Thread Adam Bordelon
Thanks for the clear explanation, Swapnil. This sounds awesome. A much
cleaner design, and it seems like exactly the kind of thing YARN
AuxServices were created for.

On Thu, Jul 16, 2015 at 5:24 PM, Swapnil Daingade 
swapnil.daing...@gmail.com wrote:

 Hi All,

 Currently with Fine Grained Scheduling (FGS), the workflow for reporting
 status and relinquishing resources used
 by a YARN container is as following

 1. The NodeManager reports the status/completion of the container to the
 ResourceManager
  as part of container statuses included in the NM to RM heartbeat

 2. This container status is intercepted by the Myriad Scheduler. The
 Scheduler sends a
 frameworkMessage to the MyriadExecutor running on the NodeManager node.
 See NMHeartBeatHandler.handleStatusUpdate here


 https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112

 3. This frameworkMessage instructs the MyriadExecutor to report the task
 state corresponding to the YARN container status back to mesos.
  See MyriadExecutor.frameworkMessage here


 https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252

 There are some disadvantages to this approach

 1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
 to the API documentation, this message is best effort.
   /**
 * Sends a message from the framework to one of its executors. These
 * messages are best effort; do not expect a framework message to be
 * retransmitted in any reliable fashion.

 2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
 to be able to report statuses to Mesos Master.
  If Scheduler/RM goes down, we will not be able to send task statuses
 to Mesos, until the Scheduler/RM is back up.
  This can lead to resource leakages.

 3. There is additional overhead of sending messages back from Scheduler/RM
 back to the Executors for each container on each
  heartbeat. (Number of yarn containers/node * Number of Nodes)
 additional messages.

 In order to avoid the above mentioned issues, we are proposing merging of
 the MyriadExecutor and NodeManager.
 The MyriadExecutor will run as a NM auxiliary service (same process as NM).
 It will be able to intercept YARN container completion locally and inform
 mesos-master irrespective of weather scheduler is running.
 We will no longer have to use the sendFrameworkMessage method.
 There will be less message traffic from scheduler to executor.

 I have posted my proposed changes as part of the pull request here
 https://github.com/mesos/myriad/pull/118

 Request you take a look and let me know your feedback.

 Regards
 Swapnil



Re: Merging MyriadExecutor with NodeManager

2015-07-17 Thread Swapnil Daingade
Hi Darin,

I had to add the following in the yarn-site.xml on NMs.

property
descriptionA comma separated list of services where service name
should only contain a-zA-Z0-9_ and can not start with numbers/description
nameyarn.nodemanager.aux-services/name
valuemapreduce_shuffle, myriad_executor/value
/property
  property
nameyarn.nodemanager.aux-services.myriad_executor.class/name
valuecom.ebay.myriad.executor.MyriadExecutorAuxService/value
  /property

and I had to extract the contents of myriad-executor-runnable-0.0.1.jar
into $YARN_HOME/share/hadoop/yarn/lib to get it working.
Please let me know if you face any issues or any other feedback you may
have.

Regards
Swapnil


On Fri, Jul 17, 2015 at 8:08 AM, Swapnil Daingade 
swapnil.daing...@gmail.com wrote:

 Should work with 2.5 as earlier (I tested on single node mapr cluster with
 hadoop-common 2.5.1).

 Only change I did was to copy the myriad executor jars under 
 $PROJECT_HOME/myriad-executor/build/libs/
 to $YARN_HOME/share/hadoop/yarn/lib
 as the MyriadExecutor now runs as part of NM.

 Actually let me spin up a NM only node and make sure all the dependencies
 for MyriadExecutor are satisfied (I tested on a node that had both RM and
 NM).

 Regards
 Swapnil


 On Fri, Jul 17, 2015 at 5:27 AM, Darin Johnson dbjohnson1...@gmail.com
 wrote:

 Awesome took a quick look, will test and go over code soon.  Any
 dependency
 issues between 2.5, 2.6 and 2.7 to be aware of?
 Hi All,

 Currently with Fine Grained Scheduling (FGS), the workflow for reporting
 status and relinquishing resources used
 by a YARN container is as following

 1. The NodeManager reports the status/completion of the container to the
 ResourceManager
  as part of container statuses included in the NM to RM heartbeat

 2. This container status is intercepted by the Myriad Scheduler. The
 Scheduler sends a
 frameworkMessage to the MyriadExecutor running on the NodeManager
 node.
 See NMHeartBeatHandler.handleStatusUpdate here


 https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112

 3. This frameworkMessage instructs the MyriadExecutor to report the task
 state corresponding to the YARN container status back to mesos.
  See MyriadExecutor.frameworkMessage here


 https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252

 There are some disadvantages to this approach

 1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
 to the API documentation, this message is best effort.
   /**
 * Sends a message from the framework to one of its executors. These
 * messages are best effort; do not expect a framework message to be
 * retransmitted in any reliable fashion.

 2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
 to be able to report statuses to Mesos Master.
  If Scheduler/RM goes down, we will not be able to send task statuses
 to Mesos, until the Scheduler/RM is back up.
  This can lead to resource leakages.

 3. There is additional overhead of sending messages back from Scheduler/RM
 back to the Executors for each container on each
  heartbeat. (Number of yarn containers/node * Number of Nodes)
 additional messages.

 In order to avoid the above mentioned issues, we are proposing merging of
 the MyriadExecutor and NodeManager.
 The MyriadExecutor will run as a NM auxiliary service (same process as
 NM).
 It will be able to intercept YARN container completion locally and inform
 mesos-master irrespective of weather scheduler is running.
 We will no longer have to use the sendFrameworkMessage method.
 There will be less message traffic from scheduler to executor.

 I have posted my proposed changes as part of the pull request here
 https://github.com/mesos/myriad/pull/118

 Request you take a look and let me know your feedback.

 Regards
 Swapnil





Merging MyriadExecutor with NodeManager

2015-07-16 Thread Swapnil Daingade
Hi All,

Currently with Fine Grained Scheduling (FGS), the workflow for reporting
status and relinquishing resources used
by a YARN container is as following

1. The NodeManager reports the status/completion of the container to the
ResourceManager
 as part of container statuses included in the NM to RM heartbeat

2. This container status is intercepted by the Myriad Scheduler. The
Scheduler sends a
frameworkMessage to the MyriadExecutor running on the NodeManager node.
See NMHeartBeatHandler.handleStatusUpdate here

https://github.com/mesos/myriad/blob/issue_14/myriad-scheduler/src/main/java/com/ebay/myriad/scheduler/NMHeartBeatHandler.java#L112

3. This frameworkMessage instructs the MyriadExecutor to report the task
state corresponding to the YARN container status back to mesos.
 See MyriadExecutor.frameworkMessage here

https://github.com/mesos/myriad/blob/issue_14/myriad-executor/src/main/java/com/ebay/myriad/executor/MyriadExecutor.java#L252

There are some disadvantages to this approach

1. In step 2 we use SchedulerDriver.sendFrameworkMessage() API. According
to the API documentation, this message is best effort.
  /**
* Sends a message from the framework to one of its executors. These
* messages are best effort; do not expect a framework message to be
* retransmitted in any reliable fashion.

2. This requires the Scheduler/RM to be up for YARN containers/Mesos Tasks
to be able to report statuses to Mesos Master.
 If Scheduler/RM goes down, we will not be able to send task statuses
to Mesos, until the Scheduler/RM is back up.
 This can lead to resource leakages.

3. There is additional overhead of sending messages back from Scheduler/RM
back to the Executors for each container on each
 heartbeat. (Number of yarn containers/node * Number of Nodes)
additional messages.

In order to avoid the above mentioned issues, we are proposing merging of
the MyriadExecutor and NodeManager.
The MyriadExecutor will run as a NM auxiliary service (same process as NM).
It will be able to intercept YARN container completion locally and inform
mesos-master irrespective of weather scheduler is running.
We will no longer have to use the sendFrameworkMessage method.
There will be less message traffic from scheduler to executor.

I have posted my proposed changes as part of the pull request here
https://github.com/mesos/myriad/pull/118

Request you take a look and let me know your feedback.

Regards
Swapnil