from:"Darin Johnson"

Re: Some thoughts after a crash course installation

2015-04-21 Thread Darin Johnson

Take a look at https://github.com/mesos/myriad/pull/83.  I think this
addresses your question, if so its being worked on.
On Apr 21, 2015 7:35 AM, Zhongyue Luo zhongyue@gmail.com wrote:

 Hi team,

 I finally got Myriad installed in our small Mesos cluster but I have some
 questions regarding Myriads implementation.

 I see that each slave needs a JRE and Hadoop binaries extracted in
 HADOOP_HOME.

 I understand why JRE is required on each slave but why the Hadoop binary
 files?

 Shouldn't the files needed for executing a Node manager be placed in the
 Myriad executor?

 Or is this a feature not yet implemented?

 I haven't yet seen Myriad's source code yet but it would be cool if someone
 in the list could help me out here. Thanks.

 --
 *Intel SSG/STO/BDT*
 880 Zixing Road, Zizhu Science Park, Minhang District, 200241, Shanghai,
 China
 +862161166500

Re: skip chown in mesos patched

2015-04-08 Thread Darin Johnson

That will eventually simplify the getCommandInfo in TaskFactory.java code I
submitted.  That's essentially what I had to do to keep the permissions
correct, definitely belongs in mesos proper.

On Wed, Apr 8, 2015 at 5:32 PM, Adam Bordelon a...@mesosphere.io wrote:

 Saw that. Thanks!
 Added the RB link to the JIRA and asked Vinod if he would Shepherd the
 patch.
 In case others want to review: https://reviews.apache.org/r/32975/

 On Wed, Apr 8, 2015 at 2:19 PM, Jim Klucar klu...@gmail.com wrote:

  In case anyone hasn't seen it, I supplied a patch to Mesos that allows us
  to skip the chown step when distributing myriad. Its in the Mesos review
  system now, and I've been in touch with Vinod.
 
  https://issues.apache.org/jira/browse/MESOS-1790

Re: Odd Errors

2015-08-18 Thread Darin Johnson

Could you paste the stderr/stdout of the executor as well?

I think this may be a bug so you won't get booted:).

Darin
On Aug 18, 2015 3:07 PM, John Omernik j...@omernik.com wrote:

 I am working to stumble through this as Santosh helped me get a pre
 incubator version of Myriad running, and now I upgraded a bunch of stuff
 and wanted to try some of the more recent features. I setup the remote
 distribution, created what I think would be a good a json for marathon and
 then I am getting the dreaded Null Pointer Exception without much help...

 Based on the logs, it appears to be pulling my URI down with the proper
 pathing and trying to execute the resource manager from the tar ball,
 perhaps this will get me kicked off the dev list but my dev foo is weak,
 thus I am not sure how to troubleshoot this. :) Any help would be
 appreciated.


 15/08/18 17:00:28 INFO mortbay.log: Started
 SelectChannelConnector@0.0.0.0:8192
 15/08/18 17:00:28 INFO myriad.Main: Initializing HealthChecks
 15/08/18 17:00:28 INFO myriad.Main: Initializing Profiles
 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile tiny
 with CPU: 1 and Memory: 4096
 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
 small with CPU: 2 and Memory: 8192
 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
 medium with CPU: 4 and Memory: 16384
 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile
 large with CPU: 8 and Memory: 32768
 15/08/18 17:00:28 INFO scheduler.NMProfileManager: Adding profile huge
 with CPU: 12 and Memory: 49152
 15/08/18 17:00:28 INFO myriad.Main: Validating nmInstances..
 15/08/18 17:00:28 INFO service.AbstractService: Service
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
 failed in state INITED; cause: java.lang.RuntimeException: Failed to
 initialize myriad
 java.lang.RuntimeException: Failed to initialize myriad
 at
 com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:35)
 at
 com.ebay.myriad.scheduler.yarn.interceptor.CompositeInterceptor.init(CompositeInterceptor.java:76)
 at
 com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.serviceInit(MyriadFairScheduler.java:50)
 at
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
 at
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:570)
 at
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
 at
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:262)
 at
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
 Caused by: java.lang.NullPointerException
 at com.ebay.myriad.Main.validateNMInstances(Main.java:166)
 at com.ebay.myriad.Main.run(Main.java:98)
 at com.ebay.myriad.Main.initialize(Main.java:80)
 at
 com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:32)
 ... 10 more
 15/08/18 17:00:28 INFO service.AbstractService: Service
 RMActiveServices failed in state INITED; cause:
 java.lang.RuntimeException: Failed to initialize myriad
 java.lang.RuntimeException: Failed to initialize myriad
 at
 com.ebay.myriad.scheduler.yarn.interceptor.MyriadInitializationInterceptor.init(MyriadInitializationInterceptor.java:35)
 at
 com.ebay.myriad.scheduler.yarn.interceptor.CompositeInterceptor.init(CompositeInterceptor.java:76)
 at
 com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.serviceInit(MyriadFairScheduler.java:50)
 at
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
 at
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:570)
 at
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
 at
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:262)
 at
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
 Caused by: java.lang.NullPointerException
 at com.ebay.myriad.Main.validateNMInstances(Main.java:166)
 at

Re: Complete Myriad HA implementation

2015-08-20 Thread Darin Johnson

Sweet, look forward to checking it out.
Hi All,

I have updated my pull request with the complete Myriad HA implementation
rebased
on top of the FGS changes here

https://github.com/mesos/myriad/pull/123

I am planning to send out another email with details on how to configure it.

Regards
Swapnil

Re: Myriad 0.1 release scope

2015-08-18 Thread Darin Johnson

John,
The remote distribution doesn't require the nm to be run from marathon
though it's possible.  Essentially, it's the same configuration for the rm
you'd do for the non remote version + adding a uri for the tarball.
I've got jsons for running the rm in marathon, I'll try to get them and
some documentation up soon.  Currently at a conference though which means
probably next week.

Darin
Darin
On Aug 18, 2015 2:49 PM, John Omernik j...@omernik.com wrote:

 Ok, so I tried the remote distribution of the Myriad per the docs, I
 guess,it could probably use some information related to how to run
 resource manager if it's in the tar.gz.  Perhaps an example marathon json.
 I am playing with it now to figure it out.

 On Tue, Aug 18, 2015 at 3:48 PM, yuliya Feldman
 yufeld...@yahoo.com.invalid
  wrote:

  mesos/myriad is the right one so far
From: John Omernik j...@omernik.com
   To: dev@myriad.incubator.apache.org; yuliya Feldman 
 yufeld...@yahoo.com
   Sent: Tuesday, August 18, 2015 1:44 PM
   Subject: Re: Myriad 0.1 release scope
 
  (So if I clone that repo, am I cloning the right one?)
 
 
 
  On Tue, Aug 18, 2015 at 3:43 PM, John Omernik j...@omernik.com wrote:
 
   Ok, I was going off
  
 https://github.com/mesos/myriad/blob/phase1/docs/myriad-configuration.md
  
   I will try it.
  
   John
  
   On Tue, Aug 18, 2015 at 3:40 PM, yuliya Feldman 
   yufeld...@yahoo.com.invalid wrote:
  
   You actually do not need to rebuild even today - just keep this file
 in
   hadoop config directory that is on the classpath: like .../etc/hadoop
From: John Omernik j...@omernik.com
To: dev@myriad.incubator.apache.org
Sent: Tuesday, August 18, 2015 1:35 PM
Subject: Re: Myriad 0.1 release scope
  
   On the release scope, will having the myriad configuration file exist
   outside the jar (i.e. you can change configuration without rebuilding)
  be
   part of the .1 release scope?
  
  
  
   On Mon, Aug 10, 2015 at 10:01 PM, Santosh Marella 
  smare...@maprtech.com
   wrote:
  
Hello All,
   
 I've merged the FGS changes into phase1. Built and tested both
 coarse
grained scaling and fine grained scaling, UI on a 4 node cluster.
   
 If anyone finds things are not working as expected, please let me
  know.
   
Thanks,
Santosh
   
On Fri, Aug 7, 2015 at 10:46 AM, Santosh Marella 
  smare...@maprtech.com
   
wrote:
   
 Hello guys,

 I propose merging FGS into phase1. As I said before, I think it's
  at a
 point where the functionality works reasonably well.
 Any future improvements/fixes/UI changes can be done via separate
   JIRAs.

 Unless there are any major concerns, I'd like to merge FGS into
  phase1
 *EOD Monday* (PDT).

 Thanks,
 Santosh

 On Wed, Aug 5, 2015 at 8:16 PM, Santosh Marella 
   smare...@maprtech.com
 wrote:

 I feel FGS is very close to making it into 0.1. PR 116 addresses
   moving
 to hadoop 2.7 and making FGS and CGS coexist. This PR was
 recently
reviewed
 by Yulia and Darin. Darin had also tried out FGS on hadoop 2.6.x
  and
2.7.x
 clusters and it seemed to have worked as expected. Unless there
 are
   more
 reviews/feedback, it can be merged into issue_14. Once PR 116 is
   merged
 into issue_14, issue_14 can be merged into phase1.

 Thanks,
 Santosh

 On Tue, Aug 4, 2015 at 4:54 PM, Adam Bordelon 
 a...@mesosphere.io
wrote:

 We do have a JIRA 0.1.0 fix version field, but none of our
  issues
   use
 it
 yet.
 I think the goal was just to take what we have and make it work
   under
 Apache infrastructure, then vote on that for 0.1.0.
 Although other features like HA or FGS would be great, let's try
  to
   get
 our
 first Apache release out ASAP.
 We can create 0.1.1 or 0.2.0 fix versions for subsequent
 releases
   with
 other issues/features. Roadmap would be great.
 (I'm just summarizing what we discussed a month or two ago. Feel
   free
to
 correct me or disagree with this approach.)

 On Tue, Aug 4, 2015 at 4:44 PM, Swapnil Daingade 
 swapnil.daing...@gmail.com
  wrote:

  Hi all,
 
  Was wondering what would be the scope for the Myriad 0.1
  release.
  It would be nice to have a roadmap page somewhere and target
  features to releases (JIRA 'fix version' field perhaps)
 
  Regards
  Swapnil

Re: Issues 16 and Issue 12

2015-08-03 Thread Darin Johnson

Swapnil,

Looked over both Docs, HA and NM restart.  It's pretty high level so I'll
look forward to the details.  Initial thoughts:

1. Getting framework reconciliation going would likely eliminate certain
issues, such as sendFrameworkMessage being unreliable.  So should be
implemented sooner than later.

2. How stable is the RMStateStore API? If there's changes between versions
of Hadoop, might be best to use Mesos's State API.

3. There was no mention of running two RM's in traditional Hadoop RM HA
(maybe in marathon even), but this should be considered a possibility. That
may have been implicit.

Saw the PR will look at it.

Darin
Hi Darin,

The Myriad HA work will involve work related to issue 16.
I already have the Myriad HA design doc for review.
Your feedback on it would be really helpful.
I also plan to send out for review parts of the Myriad HA implementation
(although it does not address task reconciliation yet). I was planning to
work on it next.

Regards
Swapnil


On Mon, Aug 3, 2015 at 12:08 PM, Darin Johnson dbjohnson1...@gmail.com
wrote:

 Is anyone actively working these?  I'm interested in both of these and
 should have some cycles to work on them.

 One question I have on issue 12 is how the generalize Scheduling Policies
 if we have autoscaling, fine grain scheduling, and fixed resources (with a
 flexup/flexdown option).  Currently it seems as though FGS is embedded
 pretty deeply.  Ideally though we could Have a SchedulerPolicy interface,
 and users could specify the SchedulerPolicy via the Myriad config.

 If I don't get a response, I'll probably start issue 16 as it's straight
 forward and write something up on 12.

 Darin

Re: Myriad 0.1 release scope

2015-08-07 Thread Darin Johnson

So I compiled the 2.5 fgs against 2.6 when I was testing.  If we abstract
this right it may just be an if statement or two.
On Aug 7, 2015 6:47 PM, Santosh Marella smare...@maprtech.com wrote:

  Myriad code base compiled against hadoop 2.7 should work on hadoop 2.5
  cluster as long as FGS (i.e. zero profile NM) is not used.

 Verified the above. As long as FGS (zero profile NM) is not used,
 Myriad compiled against hadoop 2.7 will work on hadoop 2.5.

 Thanks,
 Santosh

 On Fri, Aug 7, 2015 at 2:20 PM, Santosh Marella smare...@maprtech.com
 wrote:

   It will make working on HA easier
  Oh Yes!
 
   how do we facilitate that? Profiles?
  Profiles might be one way. Currently, FGS is supported for zero profile
  only.
  And we have seen there was an API incompatibility from 2.5 to 2.6+ in FGS
  code.
  So, ideally (since I haven't tried it myself), when FGS is merged into
  phase1,
  the Myriad code base compiled against hadoop 2.7 should work on hadoop
 2.5
  cluster as long as FGS (i.e. zero profile NM) is not used. (I'll try this
  out and
  post back what I find)
 
  However, in the long term we need a mechanism to abstract out the APIs
  that are incompatible across versions.
 
  Thanks,
  Santosh
 
  On Fri, Aug 7, 2015 at 12:12 PM, Darin Johnson dbjohnson1...@gmail.com
  wrote:
 
  It will make working on HA easier.  However, one concern that's been
  addressed previously is that FGS works for Hadoop 2.6.0+. Do we plan to
  support 2.5.X (anything lower?) also as Santosh has a way to do that, if
  so
  how do we facilitate that? Profiles?
 
  Darin
 
  On Fri, Aug 7, 2015 at 1:46 PM, Santosh Marella smare...@maprtech.com
  wrote:
 
   Hello guys,
  
   I propose merging FGS into phase1. As I said before, I think it's at a
   point where the functionality works reasonably well.
   Any future improvements/fixes/UI changes can be done via separate
 JIRAs.
  
   Unless there are any major concerns, I'd like to merge FGS into phase1
  *EOD
   Monday* (PDT).
  
   Thanks,
   Santosh
  
   On Wed, Aug 5, 2015 at 8:16 PM, Santosh Marella 
 smare...@maprtech.com
   wrote:
  
I feel FGS is very close to making it into 0.1. PR 116 addresses
  moving
   to
hadoop 2.7 and making FGS and CGS coexist. This PR was recently
  reviewed
   by
Yulia and Darin. Darin had also tried out FGS on hadoop 2.6.x and
  2.7.x
clusters and it seemed to have worked as expected. Unless there are
  more
reviews/feedback, it can be merged into issue_14. Once PR 116 is
  merged
into issue_14, issue_14 can be merged into phase1.
   
Thanks,
Santosh
   
On Tue, Aug 4, 2015 at 4:54 PM, Adam Bordelon a...@mesosphere.io
   wrote:
   
We do have a JIRA 0.1.0 fix version field, but none of our issues
  use
   it
yet.
I think the goal was just to take what we have and make it work
 under
Apache infrastructure, then vote on that for 0.1.0.
Although other features like HA or FGS would be great, let's try to
  get
our
first Apache release out ASAP.
We can create 0.1.1 or 0.2.0 fix versions for subsequent releases
  with
other issues/features. Roadmap would be great.
(I'm just summarizing what we discussed a month or two ago. Feel
  free to
correct me or disagree with this approach.)
   
On Tue, Aug 4, 2015 at 4:44 PM, Swapnil Daingade 
swapnil.daing...@gmail.com
 wrote:
   
 Hi all,

 Was wondering what would be the scope for the Myriad 0.1 release.
 It would be nice to have a roadmap page somewhere and target
 features to releases (JIRA 'fix version' field perhaps)

 Regards
 Swapnil

Re: Myriad 0.1 release scope

2015-08-07 Thread Darin Johnson

It will make working on HA easier.  However, one concern that's been
addressed previously is that FGS works for Hadoop 2.6.0+. Do we plan to
support 2.5.X (anything lower?) also as Santosh has a way to do that, if so
how do we facilitate that? Profiles?

Darin

On Fri, Aug 7, 2015 at 1:46 PM, Santosh Marella smare...@maprtech.com
wrote:

 Hello guys,

 I propose merging FGS into phase1. As I said before, I think it's at a
 point where the functionality works reasonably well.
 Any future improvements/fixes/UI changes can be done via separate JIRAs.

 Unless there are any major concerns, I'd like to merge FGS into phase1 *EOD
 Monday* (PDT).

 Thanks,
 Santosh

 On Wed, Aug 5, 2015 at 8:16 PM, Santosh Marella smare...@maprtech.com
 wrote:

  I feel FGS is very close to making it into 0.1. PR 116 addresses moving
 to
  hadoop 2.7 and making FGS and CGS coexist. This PR was recently reviewed
 by
  Yulia and Darin. Darin had also tried out FGS on hadoop 2.6.x and 2.7.x
  clusters and it seemed to have worked as expected. Unless there are more
  reviews/feedback, it can be merged into issue_14. Once PR 116 is merged
  into issue_14, issue_14 can be merged into phase1.
 
  Thanks,
  Santosh
 
  On Tue, Aug 4, 2015 at 4:54 PM, Adam Bordelon a...@mesosphere.io
 wrote:
 
  We do have a JIRA 0.1.0 fix version field, but none of our issues use
 it
  yet.
  I think the goal was just to take what we have and make it work under
  Apache infrastructure, then vote on that for 0.1.0.
  Although other features like HA or FGS would be great, let's try to get
  our
  first Apache release out ASAP.
  We can create 0.1.1 or 0.2.0 fix versions for subsequent releases with
  other issues/features. Roadmap would be great.
  (I'm just summarizing what we discussed a month or two ago. Feel free to
  correct me or disagree with this approach.)
 
  On Tue, Aug 4, 2015 at 4:44 PM, Swapnil Daingade 
  swapnil.daing...@gmail.com
   wrote:
 
   Hi all,
  
   Was wondering what would be the scope for the Myriad 0.1 release.
   It would be nice to have a roadmap page somewhere and target
   features to releases (JIRA 'fix version' field perhaps)
  
   Regards
   Swapnil

Re: JIRA work for 0.1.0

2015-10-23 Thread Darin Johnson

yuliya, jim's right this belongs on JIRA, I only commented here as you
mentioned it.  Let's continue there.

On Fri, Oct 23, 2015 at 1:38 PM, yuliya Feldman <yufeld...@yahoo.com.invalid
> wrote:

> Do you have other suggestions?
> I understand that there might be clashes even with randomization, but it
> is much lower chance then consistently getting the same port w/o
> randomization.
> Thanks,Yuliya  From: Darin Johnson <dbjohnson1...@gmail.com>
>  To: Dev <dev@myriad.incubator.apache.org>; yuliya Feldman <
> yufeld...@yahoo.com>
>  Sent: Friday, October 23, 2015 9:57 AM
>  Subject: Re: JIRA work for 0.1.0
>
> On MYRIAD-160, it's a really bad idea to let Myriad pick a random port
> outside of Mesos as other frameworks might select that port the probability
> of that happening is 1-(number of ports requested by other
> frameworks)/(number of ports in use).  This could get near 50% and cause
> frameworks that are being good citizens to crash.
>
> I'm also not convinced randomizing the port is in fact the correct fix for
> this in the long term, as there is still a non-zero chance you'll get that
> port again.
>
> Darin
>
>
>
>
>
> On Fri, Oct 23, 2015 at 12:56 AM, yuliya Feldman <
> yufeld...@yahoo.com.invalid> wrote:
>
> > Great list.
> > I would include MYRIAD-160 to the list - I am working on that one. Of
> > course as a workaround we could not use Mesos ports and let NM ports
> > randomization kick in.I also almost done with MYRIAD-148 - it was really
> > tricky. Should submit PR tonight.
> > Thanks,Yuliya
> >  From: Darin Johnson <dbjohnson1...@gmail.com>
> >  To: Dev <dev@myriad.incubator.apache.org>
> >  Sent: Thursday, October 22, 2015 8:29 PM
> >  Subject: Re: JIRA work for 0.1.0
> >
> > I think this sounds good about right.  A few Jim marked were new
> features,
> > gotta leave something for the 0.2.0 release :).
> >
> >
> >
> >
> > On Thu, Oct 22, 2015 at 8:08 AM, Santosh Marella <smare...@maprtech.com>
> > wrote:
> >
> > > I looked at the JIRAs currently marked with fix version "myriad-0.1.0".
> > > There were 19 of them. I moved a few out. We are currently at 14.
> > >
> > > However, IMO the show stoppers are really the following:
> > >
> > > MYRIAD-43 Replace com.ebay namespace with org.apache
> > > MYRIAD-44 Prepare for 0.1.0 release
> > > MYRIAD-98 Move from 4 spaces to 2 spaces for indentation
> > > MYRIAD-114 Automatic dashboard building
> > > MYRIAD-145 Document Myriad Release Process
> > > MYRIAD-150 Update NOTICE file
> > > MYRIAD-159 Change default mesos version to 0.24
> > >
> > > Unless anyone thinks there are other JIRAs that are show stoppers, I
> > think
> > > we should stick to the above list
> > > and cut a RC as soon as we address the above.
> > >
> > > **I'm positive the above can be fixed by early next week (10/27) and we
> > can
> > > have a RC out for voting mid next week (10/28).**
> > >
> > > If we can't fix the above JIRAs in time or if new ones come up as "show
> > > stoppers", we will have a revised date.
> > > And, of course, more fixes are welcome, as long as they can be merged
> > > before 10/27.
> > >
> > > Just to let everyone know about the Apache release process (@Adam, feel
> > > free to chime in):
> > >  - Apache requires that a RC be put out for voting on
> > dev@myriad.incubator
> > > for 72 hrs or until 3 binding +1s and no binding -1s,
> > >  - followed by a similar voting round on general@incubator.
> > >
> > > Now to run the last mile..!
> > >
> > > Cheers,
> > > Santosh
> > >
> > > On Wed, Oct 21, 2015 at 2:48 PM, Adam Bordelon <a...@mesosphere.io>
> > wrote:
> > >
> > > > Keep in mind that Santosh (as Release Manager) has final authority
> over
> > > > pushing things out of 0.1.0.
> > > >
> > > > I created a (hopefully public) filter for Unresolved 0.1.0 Myriad
> > JIRAs:
> > > > https://issues.apache.org/jira/browse/MYRIAD-44?filter=12333786
> > > >
> > > > Maybe we should create a dashboard too, like
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12327111
> > > >
> > > > On Wed, Oct 21, 2015 at 1:38 PM, Jim Klucar <klu...@gmail.com>
> wrote:
> > > >
> > > > > I went through JIRA and assigned a bunch of the tickets to the
> Myriad
> > > > 0.1.0
> > > > > release. Some of them are really low hanging fruit, so I think we
> can
> > > > knock
> > > > > most of these out. As Adam said in the meeting, we'd rather have
> too
> > > many
> > > > > flagged for the release and pair down rather than miss some. To
> that
> > > end,
> > > > > please go through the tickets that are still unmarked and set the
> Fix
> > > > > Version/s: to Myriad 0.1.0 if you think they should be fixed or by
> > the
> > > > > release.
> > > > >
> > > > > Jim
> > > > >
> > > >
> > >
> >
> >
> >
> >
>
>
>
>

Re: JIRA work for 0.1.0

2015-10-23 Thread Darin Johnson

On MYRIAD-160, it's a really bad idea to let Myriad pick a random port
outside of Mesos as other frameworks might select that port the probability
of that happening is 1-(number of ports requested by other
frameworks)/(number of ports in use).  This could get near 50% and cause
frameworks that are being good citizens to crash.

I'm also not convinced randomizing the port is in fact the correct fix for
this in the long term, as there is still a non-zero chance you'll get that
port again.

Darin



On Fri, Oct 23, 2015 at 12:56 AM, yuliya Feldman <
yufeld...@yahoo.com.invalid> wrote:

> Great list.
> I would include MYRIAD-160 to the list - I am working on that one. Of
> course as a workaround we could not use Mesos ports and let NM ports
> randomization kick in.I also almost done with MYRIAD-148 - it was really
> tricky. Should submit PR tonight.
> Thanks,Yuliya
>   From: Darin Johnson <dbjohnson1...@gmail.com>
>  To: Dev <dev@myriad.incubator.apache.org>
>  Sent: Thursday, October 22, 2015 8:29 PM
>  Subject: Re: JIRA work for 0.1.0
>
> I think this sounds good about right.  A few Jim marked were new features,
> gotta leave something for the 0.2.0 release :).
>
>
>
>
> On Thu, Oct 22, 2015 at 8:08 AM, Santosh Marella <smare...@maprtech.com>
> wrote:
>
> > I looked at the JIRAs currently marked with fix version "myriad-0.1.0".
> > There were 19 of them. I moved a few out. We are currently at 14.
> >
> > However, IMO the show stoppers are really the following:
> >
> > MYRIAD-43 Replace com.ebay namespace with org.apache
> > MYRIAD-44 Prepare for 0.1.0 release
> > MYRIAD-98 Move from 4 spaces to 2 spaces for indentation
> > MYRIAD-114 Automatic dashboard building
> > MYRIAD-145 Document Myriad Release Process
> > MYRIAD-150 Update NOTICE file
> > MYRIAD-159 Change default mesos version to 0.24
> >
> > Unless anyone thinks there are other JIRAs that are show stoppers, I
> think
> > we should stick to the above list
> > and cut a RC as soon as we address the above.
> >
> > **I'm positive the above can be fixed by early next week (10/27) and we
> can
> > have a RC out for voting mid next week (10/28).**
> >
> > If we can't fix the above JIRAs in time or if new ones come up as "show
> > stoppers", we will have a revised date.
> > And, of course, more fixes are welcome, as long as they can be merged
> > before 10/27.
> >
> > Just to let everyone know about the Apache release process (@Adam, feel
> > free to chime in):
> >  - Apache requires that a RC be put out for voting on
> dev@myriad.incubator
> > for 72 hrs or until 3 binding +1s and no binding -1s,
> >  - followed by a similar voting round on general@incubator.
> >
> > Now to run the last mile..!
> >
> > Cheers,
> > Santosh
> >
> > On Wed, Oct 21, 2015 at 2:48 PM, Adam Bordelon <a...@mesosphere.io>
> wrote:
> >
> > > Keep in mind that Santosh (as Release Manager) has final authority over
> > > pushing things out of 0.1.0.
> > >
> > > I created a (hopefully public) filter for Unresolved 0.1.0 Myriad
> JIRAs:
> > > https://issues.apache.org/jira/browse/MYRIAD-44?filter=12333786
> > >
> > > Maybe we should create a dashboard too, like
> > >
> >
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12327111
> > >
> > > On Wed, Oct 21, 2015 at 1:38 PM, Jim Klucar <klu...@gmail.com> wrote:
> > >
> > > > I went through JIRA and assigned a bunch of the tickets to the Myriad
> > > 0.1.0
> > > > release. Some of them are really low hanging fruit, so I think we can
> > > knock
> > > > most of these out. As Adam said in the meeting, we'd rather have too
> > many
> > > > flagged for the release and pair down rather than miss some. To that
> > end,
> > > > please go through the tickets that are still unmarked and set the Fix
> > > > Version/s: to Myriad 0.1.0 if you think they should be fixed or by
> the
> > > > release.
> > > >
> > > > Jim
> > > >
> > >
> >
>
>
>
>

Re: JIRA work for 0.1.0

2015-10-22 Thread Darin Johnson

I think this sounds good about right.  A few Jim marked were new features,
gotta leave something for the 0.2.0 release :).


On Thu, Oct 22, 2015 at 8:08 AM, Santosh Marella 
wrote:

> I looked at the JIRAs currently marked with fix version "myriad-0.1.0".
> There were 19 of them. I moved a few out. We are currently at 14.
>
> However, IMO the show stoppers are really the following:
>
> MYRIAD-43 Replace com.ebay namespace with org.apache
> MYRIAD-44 Prepare for 0.1.0 release
> MYRIAD-98 Move from 4 spaces to 2 spaces for indentation
> MYRIAD-114 Automatic dashboard building
> MYRIAD-145 Document Myriad Release Process
> MYRIAD-150 Update NOTICE file
> MYRIAD-159 Change default mesos version to 0.24
>
> Unless anyone thinks there are other JIRAs that are show stoppers, I think
> we should stick to the above list
> and cut a RC as soon as we address the above.
>
> **I'm positive the above can be fixed by early next week (10/27) and we can
> have a RC out for voting mid next week (10/28).**
>
> If we can't fix the above JIRAs in time or if new ones come up as "show
> stoppers", we will have a revised date.
> And, of course, more fixes are welcome, as long as they can be merged
> before 10/27.
>
> Just to let everyone know about the Apache release process (@Adam, feel
> free to chime in):
>  - Apache requires that a RC be put out for voting on dev@myriad.incubator
> for 72 hrs or until 3 binding +1s and no binding -1s,
>  - followed by a similar voting round on general@incubator.
>
> Now to run the last mile..!
>
> Cheers,
> Santosh
>
> On Wed, Oct 21, 2015 at 2:48 PM, Adam Bordelon  wrote:
>
> > Keep in mind that Santosh (as Release Manager) has final authority over
> > pushing things out of 0.1.0.
> >
> > I created a (hopefully public) filter for Unresolved 0.1.0 Myriad JIRAs:
> > https://issues.apache.org/jira/browse/MYRIAD-44?filter=12333786
> >
> > Maybe we should create a dashboard too, like
> >
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12327111
> >
> > On Wed, Oct 21, 2015 at 1:38 PM, Jim Klucar  wrote:
> >
> > > I went through JIRA and assigned a bunch of the tickets to the Myriad
> > 0.1.0
> > > release. Some of them are really low hanging fruit, so I think we can
> > knock
> > > most of these out. As Adam said in the meeting, we'd rather have too
> many
> > > flagged for the release and pair down rather than miss some. To that
> end,
> > > please go through the tickets that are still unmarked and set the Fix
> > > Version/s: to Myriad 0.1.0 if you think they should be fixed or by the
> > > release.
> > >
> > > Jim
> > >
> >
>

Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 1)

2015-11-11 Thread Darin Johnson

That's a good idea.

On Wed, Nov 11, 2015 at 10:03 AM, Jim Klucar <klu...@gmail.com> wrote:

> 0 (non-binding)
>
> Vagrant environment is broken.
> I did a `vagrant up` and ran the setup-yarn-1.sh and setup-yarn-2.sh
> scripts. The first had a slight problem, the second failed.
> I then tried `./gradlew build` from inside vagrant and the build failed in
> the web-ui. I believe it is due to how vagrant maps things to /vagrant but
> didn't really dig into it. It builds fine on my local machine.
>
> I recommend removing the Vagrantfile and the setup-yarn-* scripts and
> releasing. We can then decide to revamp or permanently remove the Vagrant
> setup for a separate release.
>
>
>
> On Tue, Nov 10, 2015 at 10:42 PM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > +1
> > D/L'd tar ball verified checksums
> > Flexed up/down nodes and JHS
> > Ran MR job with FGS
> >
> >
> >
> > On Tue, Nov 10, 2015 at 9:12 PM, Sarjeet Singh <
> sarjeetsi...@maprtech.com>
> > wrote:
> >
> > > +1 (Non-Binding)
> > >
> > > Verified checksums.
> > > Downloaded myriad-0.1.0-incubating-rc1.tar.gz, Compiled the code and
> > > deployed it on a 4 node MapR cluster.
> > > Tried basic functionality tests for FGS/CGS flex up/down and it worked
> > > fine.
> > > Tried running M/R job and it completed successfully.
> > > Tried framework shutdown, shutdown went smooth.
> > > Tried JHS configuration and service flex-up, and it worked fine.
> > >
> > >
> > > Thanks,
> > > Sarjeet Singh
> > >
> > > On Tue, Nov 10, 2015 at 3:05 PM, Santosh Marella <
> smare...@maprtech.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have created a build for Apache Myriad 0.1.0-incubating, release
> > > > candidate 1.
> > > >
> > > > Thanks to everyone who has contributed to this release.
> > > >
> > > > Here’s the release notes:
> > > > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> > > >
> > > > The commit to be voted upon is tagged with
> > "myriad-0.1.0-incubating-rc1"
> > > > and is available here:
> > > >
> > > >
> > >
> >
> https://git1-us-west.apache.org/repos/asf/incubator-myriad/repo?p=incubator-myriad.git;a=commit;h=9f0fa15bfaa4fdc309ada27126567a2aa5bf296b
> > > >
> > > > The artifacts to be voted upon are located here:
> > > > *
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc1/
> > > > <
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc1/
> > > > >*
> > > >
> > > > Release artifacts are signed with the following key:
> > > > https://people.apache.org/keys/committer/smarella.asc
> > > >
> > > > Please vote on releasing this package as Apache Myriad
> > 0.1.0-incubating.
> > > >
> > > > The vote is open for the next 72 hours and passes if a majority of
> > > > at least three +1 PPMC votes are cast.
> > > >
> > > > [ ] +1 Release this package as Apache Myriad 0.1.0-incubating
> > > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > > [ ] -1 Do not release this package because...
> > > >
> > > > Here is my vote:
> > > > +1 (binding)
> > > >
> > > > Thanks,
> > > > Santosh
> > > >
> > >
> >
>

Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 2)

2015-11-16 Thread Darin Johnson

I agree with Adam, it's really hard for to find blocks of time on the
weekends with so little notice.

Darin

On Sun, Nov 15, 2015 at 8:01 PM, Adam Bordelon <a...@mesosphere.io> wrote:

> In the future, let's try to allow 3 business days for votes, rather than
> just 72 hours. This gives people more of a chance to test and vote.
>
> That said, I think we can call this a successful vote, since there were 3
> binding +1s and no -1s. Also, we got additional binding +1s on the previous
> release candidate, which was not substantially different. Next we'll have
> to get the IPMC to vote on it.
>
> Great work, team!
>
> On Sun, Nov 15, 2015 at 4:13 PM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > +1
> > On Nov 15, 2015 6:33 PM, "yuliya Feldman" <yufeld...@yahoo.com.invalid>
> > wrote:
> >
> > > Thank you Jim
> > > You are not late - it is still 3:30 PM PDT :)
> > > Committers - we need at least one more vote from you for RC2 - you have
> > > 1.5 hours left.
> > > Thanks,Yuliya
> > >   From: Jim Klucar <klu...@gmail.com>
> > >  To: "dev@myriad.incubator.apache.org" <
> dev@myriad.incubator.apache.org>
> > >  Sent: Sunday, November 15, 2015 3:12 PM
> > >  Subject: Re: [VOTE] Release apache-myriad-0.1.0-incubating (release
> > > candidate 2)
> > >
> > > Even though I'm late, I had no problems.
> > >
> > > +1 NB
> > >
> > > On Saturday, November 14, 2015, Santosh Marella <smare...@maprtech.com
> >
> > > wrote:
> > >
> > > > A friendly reminder that the vote ends at 4:55 PM (PDT) on Sunday
> > > > (tomorrow), 11/15.
> > > >
> > > > Thanks,
> > > > Santosh
> > > >
> > > > On Fri, Nov 13, 2015 at 5:17 PM, Sarjeet Singh <
> > > sarjeetsi...@maprtech.com
> > > > <javascript:;>>
> > > > wrote:
> > > >
> > > > > +1 (Non-Binding)
> > > > >
> > > > > Verified checksums.
> > > > > D/L myriad-0.1.0-incubating-rc2.tar.gz, Compiled & deployed it on
> a 4
> > > > node
> > > > > MapR cluster.
> > > > > Tried FGS/CGS flex up/down, and ran hadoop M/R jobs.
> > > > > Tried myriad HA with RM restart and kill -9.
> > > > > Tried framework shutdown, and restart myriad again.
> > > > > Tried JHS configuration flex up/down and its functionality.
> > > > >
> > > > > -Sarjeet
> > > > >
> > > > > On Fri, Nov 13, 2015 at 4:17 PM, Aashreya Shankar <
> > > ashan...@maprtech.com
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > +1 (non binding)
> > > > > >
> > > > > > Successfully built binaries from rc2 tar.gz
> > > > > > Tried it 5 node MapR cluster
> > > > > > Flex up/down works accordingly
> > > > > > Hadoop jobs running fine.
> > > > > > Build was successful through Vagrant
> > > > > >
> > > > > > Thank you
> > > > > > Aashreya
> > > > > >
> > > > > > On Fri, Nov 13, 2015 at 3:13 PM, Swapnil Daingade <
> > > > > > swapnil.daing...@gmail.com <javascript:;>> wrote:
> > > > > >
> > > > > > > Downloaded rc2 tar.gz
> > > > > > > * Verified md5 and sha512 hashes successfully
> > > > > > > * Built binaries successfully
> > > > > > > * Deployed on 3 node MapR cluster
> > > > > > > * Tested NM flexup/flexdown with HA enabled and disabled.
> > > > > > > * Tried HA
> > > > > > > * Tried Framework Shutdown.
> > > > > > > All operations worked as expected.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Regards
> > > > > > > Swapnil
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Nov 12, 2015 at 4:55 PM, Santosh Marella <
> > > > > smare...@maprtech.com <javascript:;>>
> > >
> > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > &g

Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 2)

2015-11-15 Thread Darin Johnson

+1
On Nov 15, 2015 6:33 PM, "yuliya Feldman" 
wrote:

> Thank you Jim
> You are not late - it is still 3:30 PM PDT :)
> Committers - we need at least one more vote from you for RC2 - you have
> 1.5 hours left.
> Thanks,Yuliya
>   From: Jim Klucar 
>  To: "dev@myriad.incubator.apache.org" 
>  Sent: Sunday, November 15, 2015 3:12 PM
>  Subject: Re: [VOTE] Release apache-myriad-0.1.0-incubating (release
> candidate 2)
>
> Even though I'm late, I had no problems.
>
> +1 NB
>
> On Saturday, November 14, 2015, Santosh Marella 
> wrote:
>
> > A friendly reminder that the vote ends at 4:55 PM (PDT) on Sunday
> > (tomorrow), 11/15.
> >
> > Thanks,
> > Santosh
> >
> > On Fri, Nov 13, 2015 at 5:17 PM, Sarjeet Singh <
> sarjeetsi...@maprtech.com
> > >
> > wrote:
> >
> > > +1 (Non-Binding)
> > >
> > > Verified checksums.
> > > D/L myriad-0.1.0-incubating-rc2.tar.gz, Compiled & deployed it on a 4
> > node
> > > MapR cluster.
> > > Tried FGS/CGS flex up/down, and ran hadoop M/R jobs.
> > > Tried myriad HA with RM restart and kill -9.
> > > Tried framework shutdown, and restart myriad again.
> > > Tried JHS configuration flex up/down and its functionality.
> > >
> > > -Sarjeet
> > >
> > > On Fri, Nov 13, 2015 at 4:17 PM, Aashreya Shankar <
> ashan...@maprtech.com
> > >
> > > wrote:
> > >
> > > > +1 (non binding)
> > > >
> > > > Successfully built binaries from rc2 tar.gz
> > > > Tried it 5 node MapR cluster
> > > > Flex up/down works accordingly
> > > > Hadoop jobs running fine.
> > > > Build was successful through Vagrant
> > > >
> > > > Thank you
> > > > Aashreya
> > > >
> > > > On Fri, Nov 13, 2015 at 3:13 PM, Swapnil Daingade <
> > > > swapnil.daing...@gmail.com > wrote:
> > > >
> > > > > Downloaded rc2 tar.gz
> > > > > * Verified md5 and sha512 hashes successfully
> > > > > * Built binaries successfully
> > > > > * Deployed on 3 node MapR cluster
> > > > > * Tested NM flexup/flexdown with HA enabled and disabled.
> > > > > * Tried HA
> > > > > * Tried Framework Shutdown.
> > > > > All operations worked as expected.
> > > > >
> > > > > +1
> > > > >
> > > > > Regards
> > > > > Swapnil
> > > > >
> > > > >
> > > > > On Thu, Nov 12, 2015 at 4:55 PM, Santosh Marella <
> > > smare...@maprtech.com >
>
>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I have created a build for Apache Myriad 0.1.0-incubating,
> release
> > > > > > candidate 2.
> > > > > >
> > > > > > Thanks to everyone who has contributed to this release.
> > > > > >
> > > > > > Here’s the release notes:
> > > > > > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> > > > > >
> > > > > > The commit to be voted upon is tagged with
> > > > "myriad-0.1.0-incubating-rc2"
> > > > > > and is available here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://git1-us-west.apache.org/repos/asf/incubator-myriad/repo?p=incubator-myriad.git;a=commit;h=fb93291e9377cccf625bed93a9ad1ae1c4b76529
> > > > > >
> > > > > > The artifacts to be voted upon are located here:
> > > > > > *
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc2/
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc2/
> > > > > > >*
> > > > > >
> > > > > > Release artifacts are signed with the following key:
> > > > > > https://people.apache.org/keys/committer/smarella.asc
> > > > > >
> > > > > > Please vote on releasing this package as Apache Myriad
> > > > 0.1.0-incubating.
> > > > > >
> > > > > > The vote is open for the next 72 hours and passes if a majority
> of
> > > > > > at least three +1 PPMC votes are cast.
> > > > > >
> > > > > > [ ] +1 Release this package as Apache Myriad 0.1.0-incubating
> > > > > > [ ]  0 I don't feel strongly about it, but I'm okay with the
> > release
> > > > > > [ ] -1 Do not release this package because...
> > > > > >
> > > > > > Here is my vote:
> > > > > > +1 (binding)
> > > > > >
> > > > > > Thanks,
> > > > > > Santosh
> > > > > >
> > > > >
> > > >
> > >
> >
>
>

Re: New Committer: Swapnil Daingade

2015-11-05 Thread Darin Johnson

Congrats Swapnil!

On Thu, Nov 5, 2015 at 11:16 AM, Sarjeet Singh 
wrote:

> Congrats Swapnil, Very nice work on HA :)
>
> On Thu, Nov 5, 2015 at 8:13 AM, Santosh Marella 
> wrote:
>
> > Congratulations Swapnil.
> >
> > --
> > Sent from mobile
> > On Nov 5, 2015 7:31 AM, "yuliya Feldman" 
> > wrote:
> >
> > > Congratulations Swapnil!!!
> > > Well done
> > > Yuliya
> > >   From: Adam Bordelon 
> > >  To: dev@myriad.incubator.apache.org
> > >  Sent: Thursday, November 5, 2015 4:38 AM
> > >  Subject: New Committer: Swapnil Daingade
> > >
> > > The Podling Project Management Committee (PPMC) for Apache Myriad has
> > asked
> > > Swapnil Daingade to become a committer and PPMC member and we are
> pleased
> > > to announce that he has accepted.
> > >
> > > Please join me in welcoming Swapnil as a Myriad committer, and let's
> > thank
> > > him for all his contributions so far. Looking forward to more!
> > >
> > > Cheers,
> > > -Adam-
> > >
> > >
> > >
> >
>

Re: Struggling with Permissions

2015-11-17 Thread Darin Johnson

Yuliya: Are you referencing yarn.nodemanager.hostname or a mapr specific
option?

I'm working right now on passing a
-Dyarn.nodemanager.hostname=offer.getHostName().  Useful if you've got
extra ip's for a san or management network.

John: Yeah the permissions on the tarball are a pain to get right.  I'm
working on Docker Support and a build script for the tarball, which should
make things easier.  Also, to the point of using world writable directories
it's a little scary from the security side of things to allow executables
to run there, especially things running as privileged users.  Many distro's
of linux will mount /tmp noexec.

Darin

On Tue, Nov 17, 2015 at 2:53 PM, yuliya Feldman  wrote:

> Please change workdir directory for mesos slave to one that is not /tmp
> and make sure that dir is owned by root.
> There is one more caveat with binary distro and MapR - in Myriad code for
> binary distro configuration is copied from RM to NMs - it doe snot work for
> MapR since we need hostname (yes for the sake of local volumes) to be
> unique.
> MapR will have Myriad release to handle this situation.
>   From: John Omernik 
>  To: dev@myriad.incubator.apache.org
>  Sent: Tuesday, November 17, 2015 11:37 AM
>  Subject: Re: Struggling with Permissions
>
> Oh hey, I found a post by me back on Sept 9.  I looked at the Jiras and
> followed the instructions with the same errors. At this point do I still
> need to have a place where the entire path is owned by root? That seems
> like a an odd requirement (a changed of each node to facilitate a
> framework)
>
>
>
>
>
> On Tue, Nov 17, 2015 at 1:25 PM, John Omernik  wrote:
>
> > Hey all, I am struggling with permissions on myriad, trying to get the
> > right permissions in the tgz as well as who to run as.  I am running in
> > MapR, which means I need to run as mapr or root (otherwise my volume
> > creation scripts will fail on MapR, MapR folks, we should talk more about
> > those scripts)
> >
> > But back to the code, I've had lots issues. When I run the Frameworkuser
> > and Superuser as mapr, it unpacks everything as MapR and I get a
> > "/bin/container-executor" must be owned by root but is owned by 700 (my
> > mapr UID).
> >
> > So now I am running as root, and I am getting the error below as it
> > relates to /tmp. I am not sure which /tmp this refers to. the /tmp that
> my
> > slave is executing in? (i.e. my local mesos agent /tmp directory) or my
> > MaprFS /tmp directory (both of which are world writable, as /tmp
> typically
> > is... or am I mistaken here?)
> >
> > Any thoughts on how to get this to resolve? This is when nodemanager is
> > trying to start running as root and root for both of my Myriad users.
> >
> > Thanks!
> >
> >
> > Caused by: ExitCodeException exitCode=24: File /tmp must not be world or
> group writable, but is 1777
> >
> >
> >
> >
>
>
>
>

Re: Google Hangout Link

2015-08-26 Thread Darin Johnson

Ken,

Was going through the dcos guide, are there any service discovery options
in dcos for finding webservice ports haproxy, nginx?

Thanks,
Darin
PS sorry bought missing the sync I'm on vacation in the Midwest.
On Aug 26, 2015 12:05 PM, Ken Sipe k...@mesosphere.io wrote:

 I sent an invite… and BTW… the sync notes have the hangout link at the
 top:
 https://docs.google.com/document/d/1JGmJrgeg98bHw_0_sSRmyX6WiAe13OdErcFlaz6Aa04/edit#heading=h.rnolkdpzfc8u
 
 https://docs.google.com/document/d/1JGmJrgeg98bHw_0_sSRmyX6WiAe13OdErcFlaz6Aa04/edit#heading=h.rnolkdpzfc8u
 

 ken
  On Aug 26, 2015, at 11:04 AM, Brandon Gulla gulla.bran...@gmail.com
 wrote:
 
  Can someone send out the active google hangout link please?
 
  thanks
 
  --
  Brandon

Re: Flex API

2015-08-27 Thread Darin Johnson

One additional thing I'd like is the ability to flexup a nmnode on all
agents with a certain attribute as opposed to a fixed number.

Also, when I get back from vacation I plan on scoping mesos-1739 (dynamic
attributes) which would allow for tighter integration with the hdfs
framework. An alternative would be to get hostnames from the namenode,
though not as seemless.
(Reviving this thread)

We've discussed several great points in the thread (PUT vs POST, need for
GET, JSON payload vs parameters in URL, declarative interface etc).
Just to get us going, I think we should focus on a couple of things that
will be useful for Myriad users, while leaving them flexible enough to be
evolved in the future.

What I heard from several folks (some of it brought up again at MesosCon)
about the flex up/down APIs is this:
- flexup doesn't support launching NMs on specific set of hosts. This is
especially needed to launch NMs on same set of nodes that have HDFS
DataNode running.
- flexdown lacks an option to shut down NMs with a specific profile. Today,
we bring down ANY arbitrary NM.
- flexdown lacks an option to shutdown NMs running on specific hosts.

I captured my thoughts in a document here:
https://docs.google.com/document/d/1PA_POY_abP6J4youM2Q0VJ48T4OCSe258-OAz_-EO6k/edit#heading=h.1atlx0ag9s8t

@Jim: Happy to collaborate at one single place (Swagger/Google Doc) to
finalize the APIs. Just let me know.

Thanks,
Santosh

On Sun, Jun 14, 2015 at 5:29 PM, Jim Klucar klu...@gmail.com wrote:

 Seems like POST is a winner with people.

 Another thing to consider is how we want the REST interface to be vs what
 we want the UI to do. The UI could support flexup/flexdown like it is
while
 the REST interface is just a declarative state like Adam suggested. The UI
 would just be responsible for translating the request into the new state.

 Tomorrow I'll try to put together another swagger doc with some of the
 suggested options.


 On Sun, Jun 14, 2015 at 6:37 PM, yuliya Feldman
 yufeld...@yahoo.com.invalid
  wrote:

  I think we are at the point to list all the options we want flex API
to
  support.
  1. Do we continue supporting flexup/down or just flex with additional
  preposition like up/down:https://hostname:port/flex/up(down)
  2. I think we should switch to POST and may be maintain PUT for legacy
 (if
  even needed to keep it). We are not DB after all and not storing any
  retrievable info here :)
  3. We need to add status (GET) to see the status - though I think we
have
  one
  4. Define JSON payload to support different casesa. providing
  different profiles together: [{profile:big,
  instances:2},{profile:medium,instances:6}]b. provide what state we
  want Myriad to be in: I want 10 medium instances and then Myriad will
 do
  whatever isnecessary to transition to that state,
 adding/removing/resizing
  NMsc. flex/down particular instance IDsd. flex up/down
preferred
  hosts, delays, others
  5. How all this fits into FineGrain Scaling? With it we would do
 automatic
  flex up/down. And the less knobs admin will have to turn the easier it
is
  for admin and the end users.
 
From: Adam Bordelon a...@mesosphere.io
   To: dev@myriad.incubator.apache.org
   Sent: Sunday, June 14, 2015 2:54 PM
   Subject: Re: Flex API
 
  (In addition,) I'd also like to see a more declarative interface.
Instead
  of add two more instances, the user(s) could just specify the desired
  state of I want 10 medium instances and then Myriad will do whatever
is
  necessary to transition to that state, adding/removing/resizing NMs as
  necessary.
 
 
 
  On Fri, Jun 12, 2015 at 5:23 PM, Will Ochandarena 
  wochandar...@maprtech.com
   wrote:
 
   On Fri, Jun 12, 2015 at 5:11 PM, Jim Klucar klu...@gmail.com wrote:
  
What verb to use when outside of database land can be argued. I
would
   vote
for POST over PUT just because I tend to default to POST. PUT was
 there
when I showed up, so I left it.
  
  
   Last time I agonized about PUT vs POST the most logical distinction I
  found
   was that PUT should be used for idempotent operations, while POST for
   non-idempotent (like we have here with flex-up, since instance-ids are
   generated).
  
   Since the api doesn't wait until the
instances are created to return, we can't really return the instance
  IDs
   we
created.
   
  
   That seems OK to me.
  
  
The GET would just return some status?
   
  
   Yeah, I was thinking that this would be needed for a future GUI where
 we
   list all instances with parameters and status for each (profile,
 current
   cpu/ram/disk, node, uptime).  I'm picturing checkboxes next  to each
so
   users can multi-select and hit 'delete' to wipe them away (like
 flex-down
   does now).
  
   The PATCH is interesting
   
  
   Yeah, I started to write PUT but to REST geeks PUT implies you always
  have
   to rewrite the complete object when making changes.  PATCH allows more
   flexible modifications.
  
   The

Re: Complete Myriad HA implementation

2015-08-24 Thread Darin Johnson

Question on the yarn.resourcemanager.fs.state-store.uri that's a local fs
in /var/ if that's we're you're keeping the state how is it regained if the
RM is restarted on a different node in marathon?  Haven't read through all
the code yet but I'm trying to get oriented, sorry.

On Mon, Aug 24, 2015 at 9:35 PM, Swapnil Daingade 
swapnil.daing...@gmail.com wrote:

 Hi All,

 Here is a document on how to configure and try out Myriad HA.

 https://docs.google.com/document/d/1PPqQmiWgCsxrMEq56fNra2Z6JZI8vlF5HK9fHKmA9_Q/edit

 Please let me know your thoughts.
 Once I incorporate community feedback and Myriad HA makes it to phase1,
 I'll move it to a more
 permanent place like perhaps the wiki.

 Regards
 Swapnil



 On Thu, Aug 20, 2015 at 11:39 AM, Darin Johnson dbjohnson1...@gmail.com
 wrote:

  Sweet, look forward to checking it out.
  Hi All,
 
  I have updated my pull request with the complete Myriad HA implementation
  rebased
  on top of the FGS changes here
 
  https://github.com/mesos/myriad/pull/123
 
  I am planning to send out another email with details on how to configure
  it.
 
  Regards
  Swapnil

Re: Complete Myriad HA implementation

2015-09-04 Thread Darin Johnson

I've got some more comments but can really respond adequately until
tomorrow due to travel.  I'll defer if everyone else is OK with the merge.
Hi All,

I have address all the review comments I received since the last PR update
on Monday.
The latest updated PR is here https://github.com/mesos/myriad/pull/123

I am wondering, if there are no further review comments by Friday, can we
considering merging this pull request ?

Regards
Swapnil


On Mon, Aug 24, 2015 at 7:58 PM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:

> Swapnil,
>
> Generally no, they don't which makes it confusing and so I asked.  Started
> playing with it, going to be travelling tomorrow but will try out soon.
>
> Darin
>
>
> On Mon, Aug 24, 2015 at 10:50 PM, Swapnil Daingade <
> swapnil.daing...@gmail.com> wrote:
>
> > Hi Darin,
> >
> > Its a dfs path. I tried it on MapRFS.
> > Does hdfs require it to be prefixed by something like hdfs:// ?
> > If yes, I'll make the change.
> >
> > Regards
> > Swapnil
> >
> >
> > On Mon, Aug 24, 2015 at 7:28 PM, Darin Johnson <dbjohnson1...@gmail.com>
> > wrote:
> >
> > > Question on the yarn.resourcemanager.fs.state-store.uri that's a local
> fs
> > > in /var/ if that's we're you're keeping the state how is it regained
if
> > the
> > > RM is restarted on a different node in marathon?  Haven't read through
> > all
> > > the code yet but I'm trying to get oriented, sorry.
> > >
> > > On Mon, Aug 24, 2015 at 9:35 PM, Swapnil Daingade <
> > > swapnil.daing...@gmail.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > Here is a document on how to configure and try out Myriad HA.
> > > >
> > > >
> > >
> >
>
https://docs.google.com/document/d/1PPqQmiWgCsxrMEq56fNra2Z6JZI8vlF5HK9fHKmA9_Q/edit
> > > >
> > > > Please let me know your thoughts.
> > > > Once I incorporate community feedback and Myriad HA makes it to
> phase1,
> > > > I'll move it to a more
> > > > permanent place like perhaps the wiki.
> > > >
> > > > Regards
> > > > Swapnil
> > > >
> > > >
> > > >
> > > > On Thu, Aug 20, 2015 at 11:39 AM, Darin Johnson <
> > dbjohnson1...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Sweet, look forward to checking it out.
> > > > > Hi All,
> > > > >
> > > > > I have updated my pull request with the complete Myriad HA
> > > implementation
> > > > > rebased
> > > > > on top of the FGS changes here
> > > > >
> > > > > https://github.com/mesos/myriad/pull/123
> > > > >
> > > > > I am planning to send out another email with details on how to
> > > configure
> > > > > it.
> > > > >
> > > > > Regards
> > > > > Swapnil
> > > > >
> > > >
> > >
> >
>

Re: Complete Myriad HA implementation

2015-09-05 Thread Darin Johnson

That's fine.
On Sep 5, 2015 4:54 AM, "Swapnil Daingade" <swapnil.daing...@gmail.com>
wrote:

> Hi Darin,
>
> I was wondering if it would be possible to address them in a follow up PR.
>
> Addressed new review comments since Wednesday and updated PR
> https://github.com/mesos/myriad/pull/123
>
> Regards
> Swapnil
>
>
> On Fri, Sep 4, 2015 at 9:55 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > I've got some more comments but can really respond adequately until
> > tomorrow due to travel.  I'll defer if everyone else is OK with the
> merge.
> > Hi All,
> >
> > I have address all the review comments I received since the last PR
> update
> > on Monday.
> > The latest updated PR is here https://github.com/mesos/myriad/pull/123
> >
> > I am wondering, if there are no further review comments by Friday, can we
> > considering merging this pull request ?
> >
> > Regards
> > Swapnil
> >
> >
> > On Mon, Aug 24, 2015 at 7:58 PM, Darin Johnson <dbjohnson1...@gmail.com>
> > wrote:
> >
> > > Swapnil,
> > >
> > > Generally no, they don't which makes it confusing and so I asked.
> > Started
> > > playing with it, going to be travelling tomorrow but will try out soon.
> > >
> > > Darin
> > >
> > >
> > > On Mon, Aug 24, 2015 at 10:50 PM, Swapnil Daingade <
> > > swapnil.daing...@gmail.com> wrote:
> > >
> > > > Hi Darin,
> > > >
> > > > Its a dfs path. I tried it on MapRFS.
> > > > Does hdfs require it to be prefixed by something like hdfs:// ?
> > > > If yes, I'll make the change.
> > > >
> > > > Regards
> > > > Swapnil
> > > >
> > > >
> > > > On Mon, Aug 24, 2015 at 7:28 PM, Darin Johnson <
> > dbjohnson1...@gmail.com>
> > > > wrote:
> > > >
> > > > > Question on the yarn.resourcemanager.fs.state-store.uri that's a
> > local
> > > fs
> > > > > in /var/ if that's we're you're keeping the state how is it
> regained
> > if
> > > > the
> > > > > RM is restarted on a different node in marathon?  Haven't read
> > through
> > > > all
> > > > > the code yet but I'm trying to get oriented, sorry.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 9:35 PM, Swapnil Daingade <
> > > > > swapnil.daing...@gmail.com> wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > Here is a document on how to configure and try out Myriad HA.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> https://docs.google.com/document/d/1PPqQmiWgCsxrMEq56fNra2Z6JZI8vlF5HK9fHKmA9_Q/edit
> > > > > >
> > > > > > Please let me know your thoughts.
> > > > > > Once I incorporate community feedback and Myriad HA makes it to
> > > phase1,
> > > > > > I'll move it to a more
> > > > > > permanent place like perhaps the wiki.
> > > > > >
> > > > > > Regards
> > > > > > Swapnil
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 20, 2015 at 11:39 AM, Darin Johnson <
> > > > dbjohnson1...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Sweet, look forward to checking it out.
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I have updated my pull request with the complete Myriad HA
> > > > > implementation
> > > > > > > rebased
> > > > > > > on top of the FGS changes here
> > > > > > >
> > > > > > > https://github.com/mesos/myriad/pull/123
> > > > > > >
> > > > > > > I am planning to send out another email with details on how to
> > > > > configure
> > > > > > > it.
> > > > > > >
> > > > > > > Regards
> > > > > > > Swapnil
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Question about the Wiki Instructions on yarn-site.xml

2015-09-09 Thread Darin Johnson

Hey John I'm going to try to recreate issue using vanilla hadoop later
today.  Any other settings I should know about?
Darin
On Sep 9, 2015 9:42 AM, "John Omernik"  wrote:

> This was another "slipped in" question in my other thread, I am breaking
> out for specific instructions.  Basically, I was struggling with with some
> things in the wiki on this page:
>
> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
>
> In step 5:
> Step 5: Configure YARN to use Myriad
>
> Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as instructed
> in Sample: myriad-config-default.yml
> <
> https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+myriad-config-default.yml
> >
> .
>
>
> Issue 1: It should link to the yarn-site.xml page, not hte
> myriad-config.default.yml page
>
> Issue 2:
> It has us put that information in the yarn-site.xml This makes sense.  The
> resource manager needs to be aware of the myriad stuff.
>
> Then I go to create a tarball, (which I SHOULD be able to use for both
> resource manager and nodemanager... right?) However, the instructions state
> to remove the *.xml files.
>
> Step 6: Create the Tarball
>
> The tarball has all of the files needed for the Node Managers and  Resource
> Managers. The following shows how to create the tarball and place it in
> HDFS:
> cd ~
> sudo cp -rp /opt/hadoop-2.7.0 .
> sudo rm hadoop-2.7.0/etc/hadoop/*.xml
> sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0
> hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist
>
>
> What I ended up doing... since I am running the resourcemanager (myriad) in
> marathon, is I created two tarballs. One is my hadoop-2.7.0-RM.tar.gz which
> has the all the xml files still in the tar ball for shipping to marathon.
> Then other is hadoop-2.7.0-NM.tar.gz which per the instructions removes the
> *.xml files from the /etc/hadoop/ directory.
>
>
> I guess... my logic is that myriad creates the conf directory for the
> nodemanagers... but then I thought, and I overthinking something? Am I
> missing something? Could that be factoring into what I am doing here?
>
>
> Obviously my first steps are to add the extra yarn-site.xml entries, but in
> this current setup, they are only going into the resource manager yarn-site
> as the the node-managers don't have a yarn-site in their directories.  Am I
> looking at this correctly?  Perhaps we could rethink the removal process of
> the XML files in the tarball to allow this to work correctly with a single
> tarball?
>
> If I am missing something here, please advise!
>
>
> John
>

Re: Myriad Open Source Doc Review

2015-09-26 Thread Darin Johnson

Ruth,

Looked briefly at this there are significant changes that need to be made
to this document since Swapnil's last PR.  I'll see what I can do about
updating these, I'm rebuilding my dev environment at the moment so this is
a convenient time.  I am traveling early in the week though so it maybe
Wednesday before you see changes.

Darin

On Wed, Sep 16, 2015 at 4:35 PM, Ruth Harris  wrote:

> Hi Team,
>
> Could you all take some to review the documentation on the Apache Wiki?
> https://cwiki.apache.org/confluence/display/MYRIAD/Myriad+Home
>
> *Darin*, Could you take a good look at the Installing for Administrators
> section?
>
> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
> I'm not sure if there was a resolution from your discussion with John that
> impacts the documentation.
>
> If you could respond with which section(s) that you plan on reviewing, then
> I can make sure that everything gets covered.
>
> Thanks, Ruth
>
> --
> Ruth Harris
> Sr. Technical Writer, MapR
>

Re: Yarn-Site

2015-10-05 Thread Darin Johnson

sudo rm $YARN_HOME/etc/hadoop/yarn-site.xml <- This is slightly off, I'll
go correct.

On Mon, Oct 5, 2015 at 4:38 PM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:

> John, I'm running off: https://github.com/apache/incubator-myriad, it
> seems to run OK there's a couple of NPE issues though they're rare events
> (I had to invent a way to make one occur), I've got a PR for one if you
> want to test/review it for me :).
>
> I'll try to update the wiki with instructions on running the resource
> manager via marathon.
>
> Darin
>
>
> On Mon, Oct 5, 2015 at 4:27 PM, John Omernik <j...@omernik.com> wrote:
>
>> I see. That makes sense.  Thanks for the tip.
>>
>> Is it safe to pull down a recent version at this point? Are we using the
>> official "master" or phase1?  (the lazy man in me is asking for a link to
>> the current repo so I don't have to read back over emails to see where I
>> should go :)
>>
>>
>>
>> On Mon, Oct 5, 2015 at 3:25 PM, Darin Johnson <dbjohnson1...@gmail.com>
>> wrote:
>>
>> > Hey John,
>> >
>> > Are you trying to run the resource manager from the tar ball via
>> marathon?
>> > It's doable, my suggested approach would be to use a json like this:
>> >
>> > {
>> >   "id": "resource-manager",
>> >   "uris": ["hdfs://namenode:port/dist/hadoop-2.7.0.tgz",
>> >  "hdfs://namenode:port/dist/conf/hadoop/yarn-site.xml",
>> >  "hdfs://namenode:port/dist/conf/hadoop/hdfs-site.xml",
>> >  "hdfs:///dist/conf/hadoop/core-site.xml",
>> >  "hdfs://namenode:port/dist/conf/hadoop/mapred-site.xml"],
>> >   "cmd": "cp *.xml hadoop-2.7.0/etc/hadoop && cd hadoop-2.7.0 &&
>> bin/yarn
>> > resourcemanager",
>> >   "mem": 16,
>> >   "cpu": 1
>> >   "instances" : 1,
>> >   "user": "yarn"
>> > }
>> >
>> > Basically it keeps you from redoing the tar ball every time you edit a
>> > config, instead you just upload the new yarn-site.xml.  The Node Manager
>> > gets it's config from the Resource Manager (I'm assuming this is all for
>> > remote distribution, otherwise creating the tar ball is optional).
>> >
>> > Darin
>> >
>> > On Mon, Oct 5, 2015 at 2:36 PM, John Omernik <j...@omernik.com> wrote:
>> >
>> > > Hey all, I've been waiting until the chaos of the code move has died
>> > down.
>> > > I am looking to get this working on my MapR cluster now, and would
>> like
>> > > some clarification on instructions here:
>> > >
>> > >
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
>> > >
>> > > Basically, in the instructions below, it has the "remove the
>> > > yarn-site.xml.  Yet to run the resource manager with myriad, you need
>> the
>> > > yarn-site to be packaged with things (unless I am reading that
>> > incorrectly)
>> > > Is the only option right now to created a tarball for nodemanagers,
>> and
>> > > have this be different from the tarball for the resource manager?
>> > >
>> > > Step 5: Create the Tarball
>> > >
>> > > The tarball has all of the files needed for the Node Managers and
>> > Resource
>> > > Managers. The following shows how to create the tarball and place it
>> in
>> > > HDFS:
>> > > cd ~
>> > > sudo cp -rp $YARN_HOME .
>> > > sudo rm $YARN_HOME/etc/hadoop/yarn-site.xml
>> > > sudo tar -zcpf ~/hadoop-2.7.1.tar.gz hadoop-2.7.1
>> > > hadoop fs -put ~/hadoop-2.7.1.tar.gz /dist
>> > >
>> >
>>
>
>

Re: [PROPOSAL(s)] Use Release Branches, and Delete Obsolete Branches

2015-12-03 Thread Darin Johnson

My experience with alt 1 is it takes a lot of discipline or it devolves
into develop just being master.  I'd be curious how others have found it.
On Dec 3, 2015 10:07 PM, "Darin Johnson" <dbjohnson1...@gmail.com> wrote:

> +1 A, +1 B.
> On Dec 3, 2015 7:12 PM, "Sarjeet Singh" <sarjeetsi...@maprtech.com> wrote:
>
>> +1 for Proposal A -> Alt 1, and +1 for Proposal B.
>>
>> Should we also maintain 'develop' & 'master' branch as described on
>> nvie.com,
>> it was easy to read through the branching model, and understand the
>> branching flow without any complexity involved?
>>
>> Btw, Good pro/con list with references. thanks Adam!!
>>
>> -Sarjeet
>>
>> On Thu, Dec 3, 2015 at 2:49 PM, Santosh Marella <smare...@maprtech.com>
>> wrote:
>>
>> > Yup.
>> >
>> > +1 for Proposal A -> Alternative 1.
>> > +1 for Proposal B
>> >
>> > Santosh
>> >
>> > On Thu, Dec 3, 2015 at 1:03 PM, yuliya Feldman
>> <yufeld...@yahoo.com.invalid
>> > >
>> > wrote:
>> >
>> > > I fully second Todd.
>> > > Thanks,Yuliya
>> > >   From: Todd Richmond <trichm...@maprtech.com>
>> > >  To: dev@myriad.incubator.apache.org
>> > >  Sent: Thursday, December 3, 2015 8:59 AM
>> > >  Subject: Re: [PROPOSAL(s)] Use Release Branches, and Delete Obsolete
>> > > Branches
>> > >
>> > > excellent pro/con list
>> > >
>> > > +1 for either A or + .5 for Alt 1. A branch is easy to follow and
>> allows
>> > > for long term patch support. Each new 0.x.y patch release becomes the
>> > only
>> > > “supported” 0.x version but more than one “x” can be supported
>> > > simultaneously
>> > >
>> > > Alt 2 tags work best when you release very often (such as monthly) to
>> > > users who are willing to upgrade and so can quickly deprecate previous
>> > > releases. Cherry-picking is more manual effort and possibly error
>> prone
>> > as
>> > > the committer count grows
>> > >
>> > > +1 for proposal B. feature branches can usually be done on private
>> forks
>> > > instead and should definitely be removed once the feature is merged to
>> > > develop
>> > >
>> > >   Todd
>> > >
>> > >
>> > >
>> > >
>> > > > On Dec 3, 2015, at 12:36 AM, Adam Bordelon <a...@mesosphere.io>
>> wrote:
>> > > >
>> > > > Proposal A: Use Release Branches
>> > > > I propose that we create a '0.1.x' branch that will contain all of
>> the
>> > > > 0.1.0-rc tags, the final 0.1.0 release tag, and any tags related to
>> > > hotfix
>> > > > releases on top (0.1.1, 0.1.2). A hotfix release like 0.1.1 can
>> > > cherry-pick
>> > > > fixes from master directly on top of the 0.1.0 tag in the 0.1.x
>> branch,
>> > > and
>> > > > we'll iterate on its rc's in the 0.1.x branch. Development for
>> > > > features/fixes for 0.2.0 go into the master branch, and whenever
>> 0.2.0
>> > is
>> > > > feature-complete/ready, we'll cut the new '0.2.x' branch from master
>> > and
>> > > > tag a 0.2.0-rc1, then cherry pick any necessary fixes from master
>> into
>> > > > 0.2.x, for future release candidates and hotfix releases.
>> > > > + Easy to create/review github PRs to merge fixes into release
>> > candidates
>> > > > and hotfix releases.
>> > > > + Release candidates and hotfixes are handled in the appropriate
>> > release
>> > > > branch, while normal development can continue in master.
>> > > > + Minimal additional branches/commands required; no need for
>> ephemeral
>> > > > branches for each release (candidate).
>> > > >
>> > > > Alternative 1: Follow
>> > > >
>> >
>> http://nvie.com/posts/a-successful-git-branching-model/#release-branches
>> > > > My proposal is similar to the model described by nvie except that we
>> > use
>> > > > the myriad 'master' branch for what he calls the 'develop' branch,
>> and
>> > we
>> > > > use our 0.1.x,0.2.x release branches as longer-lived branches for
>> > hotfix
>> > > > releases. (Note: Feature branches are a separ

Re: [PROPOSAL(s)] Use Release Branches, and Delete Obsolete Branches

2015-12-03 Thread Darin Johnson

+1 A, +1 B.
On Dec 3, 2015 7:12 PM, "Sarjeet Singh"  wrote:

> +1 for Proposal A -> Alt 1, and +1 for Proposal B.
>
> Should we also maintain 'develop' & 'master' branch as described on
> nvie.com,
> it was easy to read through the branching model, and understand the
> branching flow without any complexity involved?
>
> Btw, Good pro/con list with references. thanks Adam!!
>
> -Sarjeet
>
> On Thu, Dec 3, 2015 at 2:49 PM, Santosh Marella 
> wrote:
>
> > Yup.
> >
> > +1 for Proposal A -> Alternative 1.
> > +1 for Proposal B
> >
> > Santosh
> >
> > On Thu, Dec 3, 2015 at 1:03 PM, yuliya Feldman
>  > >
> > wrote:
> >
> > > I fully second Todd.
> > > Thanks,Yuliya
> > >   From: Todd Richmond 
> > >  To: dev@myriad.incubator.apache.org
> > >  Sent: Thursday, December 3, 2015 8:59 AM
> > >  Subject: Re: [PROPOSAL(s)] Use Release Branches, and Delete Obsolete
> > > Branches
> > >
> > > excellent pro/con list
> > >
> > > +1 for either A or + .5 for Alt 1. A branch is easy to follow and
> allows
> > > for long term patch support. Each new 0.x.y patch release becomes the
> > only
> > > “supported” 0.x version but more than one “x” can be supported
> > > simultaneously
> > >
> > > Alt 2 tags work best when you release very often (such as monthly) to
> > > users who are willing to upgrade and so can quickly deprecate previous
> > > releases. Cherry-picking is more manual effort and possibly error prone
> > as
> > > the committer count grows
> > >
> > > +1 for proposal B. feature branches can usually be done on private
> forks
> > > instead and should definitely be removed once the feature is merged to
> > > develop
> > >
> > >   Todd
> > >
> > >
> > >
> > >
> > > > On Dec 3, 2015, at 12:36 AM, Adam Bordelon 
> wrote:
> > > >
> > > > Proposal A: Use Release Branches
> > > > I propose that we create a '0.1.x' branch that will contain all of
> the
> > > > 0.1.0-rc tags, the final 0.1.0 release tag, and any tags related to
> > > hotfix
> > > > releases on top (0.1.1, 0.1.2). A hotfix release like 0.1.1 can
> > > cherry-pick
> > > > fixes from master directly on top of the 0.1.0 tag in the 0.1.x
> branch,
> > > and
> > > > we'll iterate on its rc's in the 0.1.x branch. Development for
> > > > features/fixes for 0.2.0 go into the master branch, and whenever
> 0.2.0
> > is
> > > > feature-complete/ready, we'll cut the new '0.2.x' branch from master
> > and
> > > > tag a 0.2.0-rc1, then cherry pick any necessary fixes from master
> into
> > > > 0.2.x, for future release candidates and hotfix releases.
> > > > + Easy to create/review github PRs to merge fixes into release
> > candidates
> > > > and hotfix releases.
> > > > + Release candidates and hotfixes are handled in the appropriate
> > release
> > > > branch, while normal development can continue in master.
> > > > + Minimal additional branches/commands required; no need for
> ephemeral
> > > > branches for each release (candidate).
> > > >
> > > > Alternative 1: Follow
> > > >
> > http://nvie.com/posts/a-successful-git-branching-model/#release-branches
> > > > My proposal is similar to the model described by nvie except that we
> > use
> > > > the myriad 'master' branch for what he calls the 'develop' branch,
> and
> > we
> > > > use our 0.1.x,0.2.x release branches as longer-lived branches for
> > hotfix
> > > > releases. (Note: Feature branches are a separate discussion,
> unrelated
> > to
> > > > release management.)
> > > > + Easy to follow guide.
> > > > + Good separation of concerns/responsibility.
> > > > - Doesn't explain how release candidates are handled.
> > > > - So many branches.
> > > >
> > > > Alternative 2: Use tags for releases, no branches (like Mesos does)
> > > > See the discussion at:
> > > >
> > >
> >
> http://stackoverflow.com/questions/9810050/git-tag-vs-release-beta-branches
> > > > + No mess of branches in the repo; no merging between branches.
> > > > + Since release candidates and releases are cherry-picked and tagged,
> > > > normal development can continue on master without
> > > interruption/corruption.
> > > > - Github PRs cannot use a tag (Dealbreaker?).
> > > > http://stackoverflow.com/a/12279290/4056606
> > > >
> > > > Please let me know your thoughts on release branches. I went ahead
> and
> > > > created the '0.1.x' branch from the 0.1.0-rc3 tag so you can check it
> > out
> > > > and play around, and so you can push 0.2.0 features to master without
> > > > worrying about messing up the 0.1.0 release. We can cherry-pick any
> > > > rc4/0.1.1 patches out of master, and we can always
> delete/rename/reorg
> > > the
> > > > release branch later if desired.
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/heads/0.1.x
> > > >
> > > >
> > > >
> > > > Proposal B: Delete all these obsolete branches from the Apache git
> > repo:
> > > > 9/23/15 phase1 (72 behind

Re: Merges to Myriad master branch

2015-12-01 Thread Darin Johnson

I like Adam's idea for RCs, but I also really like the idea of a 0.1.0
branch for bug fixes.  That way we can cut a 0.1.1 maintenance release much
easier than trying to cut off master.  Seems like a lot of apache projects
handle it that way.

Darin

Otherwise there will likely be some really ugly merges.
On Mon, Nov 30, 2015 at 1:49 PM, Adam Bordelon  wrote:

> Sounds like a code freeze/thaw. What are the conditions for a *major* PR?
> Even a minor PR could introduce major bugs.
> I will point out that you have the option of cherry-picking specific new
> patches on top of the 0.1.0-rc3 tag to create a new rc. This ensures that
> 0.1.0 only includes changes that were tested in the previous rc's plus
> specific critical fixes. This is how Mesos handles patch releases (e.g.
> 0.23.0 -> 0.23.1) or release candidates after the first. Cut the rc0/1
from
> HEAD, then cherry-pick on top for all future rcs.
>
>
>
>
Do you mean cherry-pick into branches (e.g. 0.1.x branch) instead of tag
which is supposed to be immutable ? Having a branch also enables future
releases based on  0.1.0 release.



--
Luciano Resende
http://people.apache.org/~lresende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Myriad is 0.1.0

2015-12-10 Thread Darin Johnson

Sweet, thanks for all the work on the release guys!

On Thu, Dec 10, 2015 at 1:31 PM, mohit soni  wrote:

> WooHoo! Perfect holiday gift. Thanks everyone.
>
> On Thu, Dec 10, 2015 at 10:19 AM, Aashreya Shankar 
> wrote:
>
> > This is great !
> > Good job everyone.
> >
> > On Thu, Dec 10, 2015 at 8:02 AM, Yuliya 
> > wrote:
> >
> > > Great news!!!
> > >
> > > We are finally there
> > >
> > >
> > > > On Dec 10, 2015, at 1:13 AM, Santosh Marella 
> > > wrote:
> > > >
> > > > Hi All,
> > > >
> > > >  Congratulations on a the first Apache Myriad release..! Kudos to
> > > everyone
> > > > involved for making this happen.
> > > >
> > > >  As we now have IPMC's approval, there are a few things that I did to
> > > wrap
> > > > up the release:
> > > >  - Make 0.1.0 artifacts available from the release SVN repo [1].
> > > >  - Git tag the voted RC as the "0.1.0" release [2].
> > > >  - Delete the previously marked git RC tags.
> > > >  - Closed the remaining JIRAs marked for 0.1.0 version and marked the
> > > > 0.1.0 version as "released" [3].
> > > >  - Submitted a PR [4] with scripts to prepare a RC and release a RC
> > > > (automates the above git and svn steps)
> > > >  - Updated the release guide [5] with voting links to help future
> > release
> > > > managers.
> > > >
> > > >  Here are a couple of things still remaining:
> > > >  - Update the downloads page on Myriad's website with links to 0.1.0
> > > > artifacts on svn, git tag, release notes.
> > > >  - Do an announcement blog post. Here is a draft [6]. Please suggest
> > any
> > > > changes.
> > > >
> > > >  If I may be missing something for 0.1.0, appreciate if you could
> bring
> > > to
> > > > my notice.
> > > >
> > > >   1.
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/release/incubator/myriad/myriad-0.1.0-incubating/
> > > >   2.
> > > >
> > >
> >
> https://github.com/apache/incubator-myriad/releases/tag/myriad-0.1.0-incubating
> > > >   3.
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/MYRIAD/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel
> > > >   4. https://github.com/apache/incubator-myriad/pull/53
> > > >   5.
> https://cwiki.apache.org/confluence/display/MYRIAD/Release+Guide
> > > >   6.
> > > >
> > >
> >
> https://docs.google.com/document/d/1zCXnDlqzNhj0BL_CqRz5-poCap9QFah7R3tKkHdspYg/edit
> > > >
> > > > Thanks,
> > > > Santosh
> > >
> >
>

Re: Next dev sync hangout will be on 1/6/2016

2016-01-06 Thread Darin Johnson

Trying to join
On Jan 6, 2016 12:06 PM, "yuliya Feldman" 
wrote:

> Do we have a sync today?
>
>
>   From: Santosh Marella 
>  To: dev@myriad.incubator.apache.org
>  Sent: Wednesday, December 16, 2015 9:47 AM
>  Subject: Next dev sync hangout will be on 1/6/2016
>
> We have decided to hold the next dev sync on 1/6/2016 (instead of
> 12/30/2015).
>
> Meeting notes from today's hangout:
>
> https://docs.google.com/document/d/1JGmJrgeg98bHw_0_sSRmyX6WiAe13OdErcFlaz6Aa04/edit#
>
> Thanks,
> Santosh
>
>
>

Re: Struggling with Permissions

2015-11-17 Thread Darin Johnson

John,

I'm not super familiar with MapR, but I think I might have some thought and
the the MapR people can chime it :).

I think the mapr.host thing is due to the fact in the remote distribution,
Myriad pulls it's config from the resource manager.  As I mentioned in my
note to Yuyila, I'm working on adding the ability to add
yarn.nodemanager.hostname as a -D option, I think the right thing may by to
expose an environment variable $HOSTNAME and then in yarnEnvironment: you
could set a YARN_OPTS=-Dmapr.hostname=$HOSTNAME
-Dyarn.nodemanager.hostname=$HOSTNAME ... option.

One could imagine a similar option for ports as this is kind of what
Marathon does.

Maybe best to JIRA this, as I don't think we necessarily expose a lot of
things we should just yet.


On Tue, Nov 17, 2015 at 4:41 PM, John Omernik <j...@omernik.com> wrote:

> What's even stranger is I can't for life of me find where "mapr.host" gets
> set or used.  I did a grep -P -R "mapr\.host" ./*  in /opt/mapr (which
> included me pulling down the myriad code into
> /opt/mapr/myriad/incubator-myriad) and found only one reference in
> /opt/mapr/server/mapr_yarn_install.sh
>
> 
>
>   yarn.nodemanager.hostname
>
>   \${mapr.host}
>
> " | sudo tee -a ${YARN_CONF_FILE}
>
>
> But I don't think that is being called at all by the resource manager...
>
>
> (Note when I create my tarball from /opt/mapr/hadoop/hadoop-2.7.0 directory
> I am using tar -zcfhp  to both preserver permissions and include the files
> that symlinked... not sure if that affects things here.. )
>
>
>
>
>
> On Tue, Nov 17, 2015 at 3:15 PM, John Omernik <j...@omernik.com> wrote:
>
> > Well sure /tmp is world writeable but /tmp/mesos is not world writable
> > thus there is a sandbox to play in there... or am I missing something.
> Not
> > to mention my tmp is rwt which is world writable but only the creator or
> > root can modify (based on the googles).
> > Yuliya:
> >
> > I am seeing a weird behavior with MapR as it relates to (I believe) the
> > mapr_direct_shuffle.
> >
> > In the Node Manager logs, I see things starting and it saying "Checking
> > for local volume, if local volume is not present command will create and
> > mount it"
> >
> > Command invoked is : /opt/mapr/server/createTTVolume.sh
> > hadoopmapr7.brewingintel.com /var/mapr/local/
> > hadoopmapr2.brewingintel.com/mapred /var/mapr/local/
> > hadoopmapr2.brewingintel.com/mapred/nodeManager yarn
> >
> >
> > What is interesting here is hadoopmapr7 is the nodemanager it's trying to
> > start on, however the mount point it's trying to create is hadoopmapr2
> > which is the node the resource manager happened to fall on...  I was very
> > confused by that because in no place should hadoopmapr2 be "known" to the
> > nodemanager, because it thinks the resource manager hostname is
> > myriad.marathon.mesos.
> >
> > So why was it hard coding to the node the resource manager is running on?
> >
> > Well if I look at the conf file in the sandbox (the file that gets copied
> > to be yarn-site.xml for node managers.  There ARE four references the
> > hadoopmapr2. Three of the four say "source programatically" and one is
> just
> > set... that's mapr.host.  Could there be some down stream hinkyness going
> > on with how MapR is setting hostnames?  All of these variables seem
> "wrong"
> > in that mapr.host (on the node manager) should be hadoopmapr7 in this
> case,
> > and the resource managers should all be myriad.marathon.mesos.   I'd be
> > interested in your thoughts here, because I am stumped at how these are
> > getting set.
> >
> >
> >
> >
> >
> yarn.resourcemanager.addresshadoopmapr2:8032programatically
> > mapr.hosthadoopmapr2.brewingintel.com
> > 
> >
> >
> yarn.resourcemanager.resource-tracker.addresshadoopmapr2:8031programatically
> >
> >
> yarn.resourcemanager.admin.addresshadoopmapr2:8033programatically
> >
> >
> >
> >
> >
> > On Tue, Nov 17, 2015 at 2:51 PM, Darin Johnson <dbjohnson1...@gmail.com>
> > wrote:
> >
> >> Yuliya: Are you referencing yarn.nodemanager.hostname or a mapr specific
> >> option?
> >>
> >> I'm working right now on passing a
> >> -Dyarn.nodemanager.hostname=offer.getHostName().  Useful if you've got
> >> extra ip's for a san or management network.
> >>
> >> John: Yeah the permissions on the tarball are a pain to get right.  I'm
> >> working on Docker Support and a build script for the tarball, which
> shoul

Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 3)

2015-11-23 Thread Darin Johnson

+1
D/L'd built and ran and 4 node vanilla hadoop cluster (remote distro).
verified md5 and sha hashes
Ran M/R Job on CGS/FGS
Flexup and down nodes.
killed RM with HA enabled verified nodes up

noticed if JHS is up in HA mode and the RM is restarted a new JHS is
launched resulting in two JHS running (Minor should fix in next release).
noticed if HA is enabled and the the framework time expires you must
manually delete the statestore (Minor, should add documentation).

On Mon, Nov 23, 2015 at 10:34 PM, Sarjeet Singh 
wrote:

> +1 (Non-Binding)
>
> Verified md5 & sha512 checksums.
> D/L myriad-0.1.0-incubating-rc3.tar.gz, install gradle, Compiled code &
> deployed it on a 4 node MapR cluster.
> Tried FGS/CGS NM flex up/down, and ran hadoop M/R jobs.
> Tried myriad HA with RM restart/kill.
> Tried framework shutdown, and start myriad again.
> Tried JHS configuration flex up/down and its functionality.
>
> -Sarjeet
>
> On Thu, Nov 19, 2015 at 10:37 PM, Santosh Marella 
> wrote:
>
> > Hi All,
> >
> > Firstly, thanks everyone for the valuable contributions to the project
> and
> > for holding on tight as we move along the release process. We're almost
> > home!
> >
> > I have created a source tar ball for Apache Myriad 0.1.0-incubating,
> > release candidate 3. This includes the feedback from the recent IPMC
> > voting.
> > Here’s the release notes:
> > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> >
> > The commit to be voted upon is tagged with "myriad-0.1.0-incubating-rc3"
> > and is available here:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.1.0-incubating-rc3
> >
> > The artifacts to be voted upon are located below. Please note that this
> is
> > a source release:
> >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc3/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/smarella.asc
> >
> > **Please note that the release tar ball does not include the gradlew
> script
> > to build. You need to generate one in order to build.**
> >
> > Please try out the release candidate and vote. The vote is open for a
> > minimum of 72 hours or until the necessary number of votes (3 binding
> +1s)
> > is reached.
> >
> > If/when this vote succeeds, I will call for a vote with IPMC seeking
> > permission to release RC3 as Apache Myriad 0.1.0 (incubating).
> >
> > [ ] +1 Release this package as Apache Myriad 0.1.0-incubating
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> > Thanks,
> > Santosh
> >
>

Re: [VOTE] Release apache-myriad-0.1.0-incubating (release candidate 3)

2015-11-23 Thread Darin Johnson

Any guidance on why we can't include the gradlew script?  I took a look at
Apache Kafka and Apache Tapestry, their artifacts both include gradlew,
just not the jar.

Meanwhile I'll check the release ...

Darin

On Mon, Nov 23, 2015 at 4:55 PM, yuliya Feldman  wrote:

> When I download release gradlew file is not there, so I can not build
> Am I missing anything?
> Thanks,Yuliya
>   From: Santosh Marella 
>  To: dev@myriad.incubator.apache.org
>  Sent: Monday, November 23, 2015 12:36 PM
>  Subject: Re: [VOTE] Release apache-myriad-0.1.0-incubating (release
> candidate 3)
>
> +1 (binding)
>
> - Downloaded the RC
> - Verified signatures
> - Built binaries by following instructions from the README page [1]
> - Deployed binaries on a 2.7.0 MapR cluster as described in the
> documentation [2]
> - Verified the following:
>   - flexup/flexdown NMs of zero/low/medium profiles from Myriad's Web UI.
>   - Successfully ran a 10G terasort M/R job with 60 mappers and 10 reducers
>   - Myriad/RM HA: Killed RM and restarted it while a M/R job is running.
>   - Verified the job completed successfully.
>   - Verified that the Myriad UI showed the active NM tasks correctly.
>   - Verified "Shutdown Framework"
>
> [1] https://github.com/apache/incubator-myriad#build-myriad
> [2]
>
> https://github.com/apache/incubator-myriad/blob/master/docs/myriad-dev.md#step-2-deploy-the-myriad-binaries
>
> Santosh
>
>
>
> On Thu, Nov 19, 2015 at 10:37 PM, Santosh Marella 
> wrote:
>
> > Hi All,
> >
> > Firstly, thanks everyone for the valuable contributions to the project
> and
> > for holding on tight as we move along the release process. We're almost
> > home!
> >
> > I have created a source tar ball for Apache Myriad 0.1.0-incubating,
> > release candidate 3. This includes the feedback from the recent IPMC
> > voting.
> > Here’s the release notes:
> > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> >
> > The commit to be voted upon is tagged with "myriad-0.1.0-incubating-rc3"
> > and is available here:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.1.0-incubating-rc3
> >
> > The artifacts to be voted upon are located below. Please note that this
> > is a source release:
> >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.1.0-incubating-rc3/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/smarella.asc
> >
> > **Please note that the release tar ball does not include the gradlew
> > script to build. You need to generate one in order to build.**
> >
> > Please try out the release candidate and vote. The vote is open for a
> > minimum of 72 hours or until the necessary number of votes (3 binding
> > +1s) is reached.
> >
> > If/when this vote succeeds, I will call for a vote with IPMC seeking
> > permission to release RC3 as Apache Myriad 0.1.0 (incubating).
> >
> > [ ] +1 Release this package as Apache Myriad 0.1.0-incubating
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> > Thanks,
> > Santosh
> >
>
>
>

Re: Next dev sync hangout will be on 1/6/2016

2016-01-08 Thread Darin Johnson

Sounds good to me.  I think we might have another possible east coast
contributor join.

Darin

On Thu, Jan 7, 2016 at 9:09 PM, Adam Bordelon <a...@mesosphere.io> wrote:

> Sorry, this slipped off my calendar, so I didn't even try to attend. I
> guess we'll pick up again next week?
>
> On Wed, Jan 6, 2016 at 11:22 AM, Paul Curtis <pcur...@maprtech.com> wrote:
>
> > I attempted to join as well  it seems that no one was in the
> > hangout. I gave up after about 15 minutes.
> >
> > paul
> >
> > On Wed, Jan 6, 2016 at 12:19 PM, Darin Johnson <dbjohnson1...@gmail.com>
> > wrote:
> > > Can't seem to join...
> > > On Jan 6, 2016 12:16 PM, "Darin Johnson" <dbjohnson1...@gmail.com>
> > wrote:
> > >
> > >> Trying to join
> > >> On Jan 6, 2016 12:06 PM, "yuliya Feldman" <yufeld...@yahoo.com.invalid
> >
> > >> wrote:
> > >>
> > >>> Do we have a sync today?
> > >>>
> > >>>
> > >>>   From: Santosh Marella <smare...@maprtech.com>
> > >>>  To: dev@myriad.incubator.apache.org
> > >>>  Sent: Wednesday, December 16, 2015 9:47 AM
> > >>>  Subject: Next dev sync hangout will be on 1/6/2016
> > >>>
> > >>> We have decided to hold the next dev sync on 1/6/2016 (instead of
> > >>> 12/30/2015).
> > >>>
> > >>> Meeting notes from today's hangout:
> > >>>
> > >>>
> >
> https://docs.google.com/document/d/1JGmJrgeg98bHw_0_sSRmyX6WiAe13OdErcFlaz6Aa04/edit#
> > >>>
> > >>> Thanks,
> > >>> Santosh
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> >
> >
> >
> > --
> > Paul Curtis - Senior Product Technologist
> > O: +1 203-660-0015 - M: +1 203-539-9705
> >
> > Now Available - Free Hadoop On-Demand Training
> >
>

Re: Next dev sync hangout will be on 1/6/2016

2016-01-12 Thread Darin Johnson

I'll be available.

On Tue, Jan 12, 2016 at 3:05 PM, Adam Bordelon <a...@mesosphere.io> wrote:

> I'll be ready for a meeting tomorrow.
>
> On Tue, Jan 12, 2016 at 11:03 AM, Sarjeet Singh <sarjeetsi...@maprtech.com
> >
> wrote:
>
> > +1.
> >
> > Can we confirm this for tomorrow if this is happening?
> >
> > -Sarjeet
> >
> > On Fri, Jan 8, 2016 at 6:53 PM, Darin Johnson <dbjohnson1...@gmail.com>
> > wrote:
> >
> > > Sounds good to me.  I think we might have another possible east coast
> > > contributor join.
> > >
> > > Darin
> > >
> > > On Thu, Jan 7, 2016 at 9:09 PM, Adam Bordelon <a...@mesosphere.io>
> > wrote:
> > >
> > > > Sorry, this slipped off my calendar, so I didn't even try to attend.
> I
> > > > guess we'll pick up again next week?
> > > >
> > > > On Wed, Jan 6, 2016 at 11:22 AM, Paul Curtis <pcur...@maprtech.com>
> > > wrote:
> > > >
> > > > > I attempted to join as well  it seems that no one was in the
> > > > > hangout. I gave up after about 15 minutes.
> > > > >
> > > > > paul
> > > > >
> > > > > On Wed, Jan 6, 2016 at 12:19 PM, Darin Johnson <
> > > dbjohnson1...@gmail.com>
> > > > > wrote:
> > > > > > Can't seem to join...
> > > > > > On Jan 6, 2016 12:16 PM, "Darin Johnson" <
> dbjohnson1...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> Trying to join
> > > > > >> On Jan 6, 2016 12:06 PM, "yuliya Feldman"
> > > <yufeld...@yahoo.com.invalid
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Do we have a sync today?
> > > > > >>>
> > > > > >>>
> > > > > >>>   From: Santosh Marella <smare...@maprtech.com>
> > > > > >>>  To: dev@myriad.incubator.apache.org
> > > > > >>>  Sent: Wednesday, December 16, 2015 9:47 AM
> > > > > >>>  Subject: Next dev sync hangout will be on 1/6/2016
> > > > > >>>
> > > > > >>> We have decided to hold the next dev sync on 1/6/2016 (instead
> of
> > > > > >>> 12/30/2015).
> > > > > >>>
> > > > > >>> Meeting notes from today's hangout:
> > > > > >>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1JGmJrgeg98bHw_0_sSRmyX6WiAe13OdErcFlaz6Aa04/edit#
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Santosh
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Paul Curtis - Senior Product Technologist
> > > > > O: +1 203-660-0015 - M: +1 203-539-9705
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > >
> > > >
> > >
> >
>

Re: problem getting fine grained scaling workig

2016-06-03 Thread Darin Johnson

That is correct you need at least one node manager with the minimum
requirements to launch an ApplicationMaster.  Otherwise YARN will throw an
exception.

On Fri, Jun 3, 2016 at 10:52 AM, yuliya Feldman  wrote:

> I believe you need at least one NM that is not subject to fine grain
> scaling.
> So far if total resources on the cluster is less then a single container
> needs for AM you won't be able to submit any app.As exception below tells
> you.
> (Invalid resource request, requested memory < 0, or requested memory >max
> configured, requestedMemory=1536, maxMemory=0
> at)
> I believe by default when starting Myriad cluster one NM with non 0
> capacity should start by default.
> In addition see in RM log whether offers with resources are coming to RM -
> this info should be in the log.
>
>   From: Stephen Gran 
>  To: "dev@myriad.incubator.apache.org" 
>  Sent: Friday, June 3, 2016 1:29 AM
>  Subject: problem getting fine grained scaling workig
>
> Hi,
>
> I'm trying to get fine grained scaling going on a test mesos cluster.  I
> have a single master and 2 agents.  I am running 2 node managers with
> the zero profile, one per agent.  I can see both of them in the RM UI
> reporting correctly as having 0 resources.
>
> I'm getting stack traces when I try to launch a sample application,
> though.  I feel like I'm just missing something obvious somewhere - can
> anyone shed any light?
>
> This is on a build of yesterday's git head.
>
> Cheers,
>
> root@master:/srv/apps/hadoop# bin/yarn jar
> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen 1
> /outDir
> 16/06/03 08:23:33 INFO client.RMProxy: Connecting to ResourceManager at
> master.testing.local/10.0.5.3:8032
> 16/06/03 08:23:34 INFO terasort.TeraSort: Generating 1 using 2
> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: number of splits:2
> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: Submitting tokens for
> job: job_1464902078156_0001
> 16/06/03 08:23:35 INFO mapreduce.JobSubmitter: Cleaning up the staging
> area /tmp/hadoop-yarn/staging/root/.staging/job_1464902078156_0001
> java.io.IOException:
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
> Invalid resource request, requested memory < 0, or requested memory >
> max configured, requestedMemory=1536, maxMemory=0
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
> at
>
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
> at
>
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
> at
>
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
> at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>
> at
> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
> at
>
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>

Re: problem getting fine grained scaling working

2016-06-03 Thread Darin Johnson

That is normal behavior, Myriad keeps the resources to flexup a node
manager incase a job comes in of a few seconds and then releases them.  The
info statement is arguably chatty and will probably go to debug in a few
more releases.


On Fri, Jun 3, 2016 at 9:18 AM, Stephen Gran 
wrote:

> Hi,
>
> Not sure if this is relevant, but I see this in the RM logs:
>
> 2016-06-03 13:06:55,466 INFO
> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting
> capacity for node slave1.testing.local to 
> 2016-06-03 13:06:55,467 INFO
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
> Update resource on node: slave1.testing.local from:  vCores:0>, to: 
> 2016-06-03 13:06:55,467 INFO
> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting
> capacity for node slave1.testing.local to 
> 2016-06-03 13:06:55,470 INFO
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
> Update resource on node: slave1.testing.local from:  vCores:6>, to: 
>
>
> This is happening for each nodemanager, repeating every 5 or 6 seconds.
>   I'm assuming this will be the NM sending the actual capacity report to
> the RM, for use in updating YARN's view of available resource.  I don't
> know if it should be going back and forth like it is, though?
>
> Cheers,
>
> On 03/06/16 09:29, Stephen Gran wrote:
> > Hi,
> >
> > I'm trying to get fine grained scaling going on a test mesos cluster.  I
> > have a single master and 2 agents.  I am running 2 node managers with
> > the zero profile, one per agent.  I can see both of them in the RM UI
> > reporting correctly as having 0 resources.
> >
> > I'm getting stack traces when I try to launch a sample application,
> > though.  I feel like I'm just missing something obvious somewhere - can
> > anyone shed any light?
> >
> > This is on a build of yesterday's git head.
> >
> > Cheers,
> >
> > root@master:/srv/apps/hadoop# bin/yarn jar
> > share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen 1
> > /outDir
> > 16/06/03 08:23:33 INFO client.RMProxy: Connecting to ResourceManager at
> > master.testing.local/10.0.5.3:8032
> > 16/06/03 08:23:34 INFO terasort.TeraSort: Generating 1 using 2
> > 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: number of splits:2
> > 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: Submitting tokens for
> > job: job_1464902078156_0001
> > 16/06/03 08:23:35 INFO mapreduce.JobSubmitter: Cleaning up the staging
> > area /tmp/hadoop-yarn/staging/root/.staging/job_1464902078156_0001
> > java.io.IOException:
> > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
> > Invalid resource request, requested memory < 0, or requested memory >
> > max configured, requestedMemory=1536, maxMemory=0
> >  at
> >
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
> >  at
> >
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
> >  at
> >
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
> >  at
> >
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> >  at
> >
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
> >  at
> >
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
> >  at
> >
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
> >  at
> >
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
> >  at
> >
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
> >  at
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> >  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> >  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> >  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> >  at java.security.AccessController.doPrivileged(Native Method)
> >  at javax.security.auth.Subject.doAs(Subject.java:422)
> >  at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> >  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> >
> >  at
> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
> >  at
> >
>

Re: problem getting fine grained scaling workig

2016-06-05 Thread Darin Johnson

Hey Stephen,

I think you're pretty close.

Looking at the config I'd suggest removing these properties:

   
yarn.nodemanager.resource.memory-mb
4096


yarn.scheduler.maximum-allocation-vcores
12


yarn.scheduler.maximum-allocation-mb
8192

  
   yarn.nodemanager.vmem-check-enabled
false
Whether virtual memory limits will be enforced for
containers
  

   yarn.nodemanager.vmem-pmem-ratio
4
Ratio between virtual memory to physical memory when
setting memory limits for containers
  

I'll try them out on my test cluster later today/tonight and see if I can
recreate the problem.  What version of hadoop are you running?  I'll make
sure I'm consistent with that as well.

Thanks,

Darin
On Jun 5, 2016 8:15 AM, "Stephen Gran" <stephen.g...@piksel.com> wrote:

> Hi,
>
> Attached.  Thanks very much for looking.
>
> Cheers,
>
> On 05/06/16 12:51, Darin Johnson wrote:
> > Hey Steven can you please send your yarn-site.xml, I'm guessing you're on
> > the right track.
> >
> > Darin
> > Hi,
> >
> > OK.  That helps, thank you.  I think I just misunderstood the docs (or
> > they never said explicitly that you did need at least some static
> > resource), and I scaled down the initial nm.medium that got started.  I
> > get a bit further now, and jobs start but are killed with:
> >
> > Diagnostics: Container
> > [pid=3865,containerID=container_1465112239753_0001_03_01] is running
> > beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
> > memory used; 2.6 GB of 0B virtual memory used. Killing container
> >
> > When I've seen this in the past with yarn but without myriad, it was
> > usually about ratios of vmem to mem and things like that - I've tried
> > some of those knobs, but I didn't expect much result and didn't get any.
> >
> > What strikes me about the error message is that the vmem and mem
> > allocations are for 0.
> >
> > I'm sorry for asking what are probably naive questions here, I couldn't
> > find a different forum.  If there is one, please point me there so I
> > don't disrupt the dev flow here.
> >
> > I can see this in the logs:
> >
> >
> > 2016-06-05 07:39:25,687 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> > container_1465112239753_0001_03_01 Container Transitioned from NEW
> > to ALLOCATED
> > 2016-06-05 07:39:25,688 INFO
> > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root
> >  OPERATION=AM Allocated ContainerTARGET=SchedulerApp
> > RESULT=SUCCESS  APPID=application_1465112239753_0001
> > CONTAINERID=container_1465112239753_0001_03_01
> > 2016-06-05 07:39:25,688 INFO
> > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> > Assigned container container_1465112239753_0001_03_01 of capacity
> > <memory:0, vCores:0> on host slave2.testing.local:26688, which has 1
> > containers, <memory:0, vCores:0> used and <memory:4096, vCores:1>
> > available after allocation
> > 2016-06-05 07:39:25,689 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> > Sending NMToken for nodeId : slave2.testing.local:26688 for container :
> > container_1465112239753_0001_03_01
> > 2016-06-05 07:39:25,696 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> > container_1465112239753_0001_03_01 Container Transitioned from
> > ALLOCATED to ACQUIRED
> > 2016-06-05 07:39:25,696 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
> > Clear node set for appattempt_1465112239753_0001_03
> > 2016-06-05 07:39:25,696 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> > Storing attempt: AppId: application_1465112239753_0001 AttemptId:
> > appattempt_1465112239753_0001_03 MasterContainer: Container:
> > [ContainerId: container_1465112239753_0001_03_01, NodeId:
> > slave2.testing.local:26688, NodeHttpAddress: slave2.testing.local:24387,
> > Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
> > ContainerToken, service: 10.0.5.5:26688 }, ]
> > 2016-06-05 07:39:25,697 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> > appattempt_1465112239753_0001_03 State change from SCHEDULED to
> > ALLOCATED_SAVING
> > 2016-06-05 07:39:25,698 INFO
> >
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttem

Re: problem getting fine grained scaling workig

2016-06-05 Thread Darin Johnson

Stephen,

I was able to recreate the problem (specific due to 2.7.2, they changed the
defaults on the following two properties to true).  Setting them to false
allowed me to again run map reduce jobs.  I'll try to update the
documentation later today.

  

yarn.nodemanager.pmem-check-enabled

false

  

  

yarn.nodemanager.vmem-check-enabled

false

  

Darin

On Sun, Jun 5, 2016 at 10:30 AM, Stephen Gran <stephen.g...@piksel.com>
wrote:

> Hi,
>
> I think those are the properties I added when I started getting this
> error.  Removing them doesn't seem to make any difference, sadly.
>
> This is hadoop 2.7.2
>
> Cheers,
>
> On 05/06/16 14:45, Darin Johnson wrote:
> > Hey Stephen,
> >
> > I think you're pretty close.
> >
> > Looking at the config I'd suggest removing these properties:
> >
> > 
> >  yarn.nodemanager.resource.memory-mb
> >  4096
> >  
> >  
> >  yarn.scheduler.maximum-allocation-vcores
> >  12
> >  
> >  
> >  yarn.scheduler.maximum-allocation-mb
> >  8192
> >  
> >
> > yarn.nodemanager.vmem-check-enabled
> >  false
> >  Whether virtual memory limits will be enforced for
> > containers
> >
> > 
> > yarn.nodemanager.vmem-pmem-ratio
> >  4
> >  Ratio between virtual memory to physical memory when
> > setting memory limits for containers
> >
> >
> > I'll try them out on my test cluster later today/tonight and see if I can
> > recreate the problem.  What version of hadoop are you running?  I'll make
> > sure I'm consistent with that as well.
> >
> > Thanks,
> >
> > Darin
> > On Jun 5, 2016 8:15 AM, "Stephen Gran" <stephen.g...@piksel.com> wrote:
> >
> >> Hi,
> >>
> >> Attached.  Thanks very much for looking.
> >>
> >> Cheers,
> >>
> >> On 05/06/16 12:51, Darin Johnson wrote:
> >>> Hey Steven can you please send your yarn-site.xml, I'm guessing you're
> on
> >>> the right track.
> >>>
> >>> Darin
> >>> Hi,
> >>>
> >>> OK.  That helps, thank you.  I think I just misunderstood the docs (or
> >>> they never said explicitly that you did need at least some static
> >>> resource), and I scaled down the initial nm.medium that got started.  I
> >>> get a bit further now, and jobs start but are killed with:
> >>>
> >>> Diagnostics: Container
> >>> [pid=3865,containerID=container_1465112239753_0001_03_01] is
> running
> >>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
> >>> memory used; 2.6 GB of 0B virtual memory used. Killing container
> >>>
> >>> When I've seen this in the past with yarn but without myriad, it was
> >>> usually about ratios of vmem to mem and things like that - I've tried
> >>> some of those knobs, but I didn't expect much result and didn't get
> any.
> >>>
> >>> What strikes me about the error message is that the vmem and mem
> >>> allocations are for 0.
> >>>
> >>> I'm sorry for asking what are probably naive questions here, I couldn't
> >>> find a different forum.  If there is one, please point me there so I
> >>> don't disrupt the dev flow here.
> >>>
> >>> I can see this in the logs:
> >>>
> >>>
> >>> 2016-06-05 07:39:25,687 INFO
> >>>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> >>> container_1465112239753_0001_03_01 Container Transitioned from NEW
> >>> to ALLOCATED
> >>> 2016-06-05 07:39:25,688 INFO
> >>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root
> >>>   OPERATION=AM Allocated ContainerTARGET=SchedulerApp
> >>> RESULT=SUCCESS  APPID=application_1465112239753_0001
> >>> CONTAINERID=container_1465112239753_0001_03_01
> >>> 2016-06-05 07:39:25,688 INFO
> >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> >>> Assigned container container_1465112239753_0001_03_01 of capacity
> >>> <memory:0, vCores:0> on host slave2.testing.local:26688, which has 1
> >>> containers, <memory:0, vCores:0> used and <memory:4096, vCores:1>
> >>> available after allocation
> >>> 20

Re: problem getting fine grained scaling workig

2016-06-08 Thread Darin Johnson

Will do today, if you'd like to help with the documentation I could give
you access.

On Wed, Jun 8, 2016 at 3:14 AM, Stephen Gran <stephen.g...@piksel.com>
wrote:

> Hi,
>
> Can someone with access please correct the screenshot here:
> https://cwiki.apache.org/confluence/display/MYRIAD/Fine-grained+Scaling
>
> This gives the strong impression that you don't need an NM with non-zero
> resources.  I think this is what initially steered me down the wrong path.
>
> Cheers,
>
> On 03/06/16 16:38, Darin Johnson wrote:
> > That is correct you need at least one node manager with the minimum
> > requirements to launch an ApplicationMaster.  Otherwise YARN will throw
> an
> > exception.
> >
> > On Fri, Jun 3, 2016 at 10:52 AM, yuliya Feldman
> <yufeld...@yahoo.com.invalid
> >> wrote:
> >
> >> I believe you need at least one NM that is not subject to fine grain
> >> scaling.
> >> So far if total resources on the cluster is less then a single container
> >> needs for AM you won't be able to submit any app.As exception below
> tells
> >> you.
> >> (Invalid resource request, requested memory < 0, or requested memory
> >max
> >> configured, requestedMemory=1536, maxMemory=0
> >>  at)
> >> I believe by default when starting Myriad cluster one NM with non 0
> >> capacity should start by default.
> >> In addition see in RM log whether offers with resources are coming to
> RM -
> >> this info should be in the log.
> >>
> >>From: Stephen Gran <stephen.g...@piksel.com>
> >>   To: "dev@myriad.incubator.apache.org" <
> dev@myriad.incubator.apache.org>
> >>   Sent: Friday, June 3, 2016 1:29 AM
> >>   Subject: problem getting fine grained scaling workig
> >>
> >> Hi,
> >>
> >> I'm trying to get fine grained scaling going on a test mesos cluster.  I
> >> have a single master and 2 agents.  I am running 2 node managers with
> >> the zero profile, one per agent.  I can see both of them in the RM UI
> >> reporting correctly as having 0 resources.
> >>
> >> I'm getting stack traces when I try to launch a sample application,
> >> though.  I feel like I'm just missing something obvious somewhere - can
> >> anyone shed any light?
> >>
> >> This is on a build of yesterday's git head.
> >>
> >> Cheers,
> >>
> >> root@master:/srv/apps/hadoop# bin/yarn jar
> >> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen 1
> >> /outDir
> >> 16/06/03 08:23:33 INFO client.RMProxy: Connecting to ResourceManager at
> >> master.testing.local/10.0.5.3:8032
> >> 16/06/03 08:23:34 INFO terasort.TeraSort: Generating 1 using 2
> >> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: number of splits:2
> >> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: Submitting tokens for
> >> job: job_1464902078156_0001
> >> 16/06/03 08:23:35 INFO mapreduce.JobSubmitter: Cleaning up the staging
> >> area /tmp/hadoop-yarn/staging/root/.staging/job_1464902078156_0001
> >> java.io.IOException:
> >> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
> >> Invalid resource request, requested memory < 0, or requested memory >
> >> max configured, requestedMemory=1536, maxMemory=0
> >>  at
> >>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
> >>  at
> >>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
> >>  at
> >>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
> >>  at
> >>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> >>  at
> >>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
> >>  at
> >>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
> >>  at
> >>
> >>
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
> >>  at
> >>
> >>
> org.apa

Re: problem getting fine grained scaling workig

2016-06-06 Thread Darin Johnson

No worries, keep me posted.  I think we did a good proof of concept, we're
trying to make it solid now so if you find any issues let us know.

Darin
On Jun 5, 2016 2:57 PM, "Stephen Gran" <stephen.g...@piksel.com> wrote:

> Hi,
>
> Brilliant!  Working now.
>
> Thank you very much,
>
> On 05/06/16 18:09, Darin Johnson wrote:
> > Stephen,
> >
> > I was able to recreate the problem (specific due to 2.7.2, they changed
> the
> > defaults on the following two properties to true).  Setting them to false
> > allowed me to again run map reduce jobs.  I'll try to update the
> > documentation later today.
> >
> >
> >
> >  yarn.nodemanager.pmem-check-enabled
> >
> >  false
> >
> >
> >
> >
> >
> >  yarn.nodemanager.vmem-check-enabled
> >
> >  false
> >
> >
> >
> > Darin
> >
> > On Sun, Jun 5, 2016 at 10:30 AM, Stephen Gran <stephen.g...@piksel.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I think those are the properties I added when I started getting this
> >> error.  Removing them doesn't seem to make any difference, sadly.
> >>
> >> This is hadoop 2.7.2
> >>
> >> Cheers,
> >>
> >> On 05/06/16 14:45, Darin Johnson wrote:
> >>> Hey Stephen,
> >>>
> >>> I think you're pretty close.
> >>>
> >>> Looking at the config I'd suggest removing these properties:
> >>>
> >>>  
> >>>   yarn.nodemanager.resource.memory-mb
> >>>   4096
> >>>   
> >>>   
> >>>   yarn.scheduler.maximum-allocation-vcores
> >>>   12
> >>>   
> >>>   
> >>>   yarn.scheduler.maximum-allocation-mb
> >>>   8192
> >>>   
> >>> 
> >>>  yarn.nodemanager.vmem-check-enabled
> >>>   false
> >>>   Whether virtual memory limits will be enforced for
> >>> containers
> >>> 
> >>> 
> >>>  yarn.nodemanager.vmem-pmem-ratio
> >>>   4
> >>>   Ratio between virtual memory to physical memory when
> >>> setting memory limits for containers
> >>> 
> >>>
> >>> I'll try them out on my test cluster later today/tonight and see if I
> can
> >>> recreate the problem.  What version of hadoop are you running?  I'll
> make
> >>> sure I'm consistent with that as well.
> >>>
> >>> Thanks,
> >>>
> >>> Darin
> >>> On Jun 5, 2016 8:15 AM, "Stephen Gran" <stephen.g...@piksel.com>
> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Attached.  Thanks very much for looking.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> On 05/06/16 12:51, Darin Johnson wrote:
> >>>>> Hey Steven can you please send your yarn-site.xml, I'm guessing
> you're
> >> on
> >>>>> the right track.
> >>>>>
> >>>>> Darin
> >>>>> Hi,
> >>>>>
> >>>>> OK.  That helps, thank you.  I think I just misunderstood the docs
> (or
> >>>>> they never said explicitly that you did need at least some static
> >>>>> resource), and I scaled down the initial nm.medium that got
> started.  I
> >>>>> get a bit further now, and jobs start but are killed with:
> >>>>>
> >>>>> Diagnostics: Container
> >>>>> [pid=3865,containerID=container_1465112239753_0001_03_01] is
> >> running
> >>>>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
> >>>>> memory used; 2.6 GB of 0B virtual memory used. Killing container
> >>>>>
> >>>>> When I've seen this in the past with yarn but without myriad, it was
> >>>>> usually about ratios of vmem to mem and things like that - I've tried
> >>>>> some of those knobs, but I didn't expect much result and didn't get
> >> any.
> >>>>>
> >>>>> What strikes me about the error message is that the vmem and mem
> >>>>> allocations are for 0.
> >>>>>
> >>>>> I'm sorry for asking what are probably naive questions here, I
> c

Re: [Vote] Release apache-myriad-0.2.0-incubating (release candidate 4)

2016-06-09 Thread Darin Johnson

The vote for 0.2.0 RC4 has concluded and passed.  Thanks to everyone who
verified the release and voted!

Binding +1's
Darin Johnson
Santosh Marella
Mohit Soni

Non-Binding +1's
John Yost
Sarjeet Signh
Brandon Gulla

I'll submit the release to the IPMC to vote.

Darin

On Thu, Jun 9, 2016 at 8:49 PM, Adam Bordelon <a...@mesosphere.io> wrote:

> Looks like we got our 3rd binding vote! Let's announce the result and ask
> Incubator PMC to begin their vote. Darin, let me/Santosh know if you need
> advice on this part of the process.
>
>
> On Thu, Jun 9, 2016 at 5:44 PM, Brandon Gulla <gulla.bran...@gmail.com>
> wrote:
>
> > +1
> >
> > built and tested on a test cluster. great work guys.
> >
> > On Thu, Jun 9, 2016 at 6:39 PM, mohit soni <mohitsoni1...@gmail.com>
> > wrote:
> >
> > > +1 (Binding)
> > >
> > > - Verified signature
> > > - Verified MD5 and SHA512 hashes
> > > - Builds from source tar ball.
> > > - Installed Myriad on a Mesos cluster and ran sanity tests.
> > >
> > > Thanks
> > > Mohit
> > >
> > > On Thu, Jun 2, 2016 at 7:25 AM, John Yost <hokiege...@gmail.com>
> wrote:
> > >
> > > > I'm voting +1
> > > >
> > > > --John
> > > >
> > > > On Tue, May 24, 2016 at 10:46 PM, Darin Johnson <
> > dbjohnson1...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I'm voting +1 (Binding)
> > > > >
> > > > > Verified md5/sha hashes.  Compiled with gradle build, gradle
> > > > buildRMDocker
> > > > > (on OSX with docker-machine).
> > > > >
> > > > > Ran remote distribution (with cgroups) on a 4 node cluster (Ubuntu,
> > > > > hadoop-2.6.0, hadoop 2.7.0) with one CGS NM and 3 FGS NM.  Ran 8
> > > > > simultaneous jobs.  Shut down Framework.  Restarted NodeManager,
> ran
> > an
> > > > > additional 3 jobs.
> > > > >
> > > > > Ran the same with docker (minus cgroups).
> > > > >
> > > > > Darin
> > > > >
> > > > > On Tue, May 24, 2016 at 10:40 PM, Darin Johnson <
> > > dbjohnson1...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I have created a source tar ball for Apache Myriad
> > 0.2.0-incubating,
> > > > > > release candidate 3 based off the feed back received from release
> > > > > > candidate 1,2 & 3.  Thanks Sarjeet for a very thorough review!
> > > > > >
> > > > > > Here’s the release notes:
> > > > > > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> > > > > >
> > > > > > The commit to be voted upon is tagged with
> > > > "myriad-0.2.0-incubating-rc4"
> > > > > > and is available here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc4
> > > > > >
> > > > > > 4
> > > > > >
> > > > > > The artifacts to be voted upon are located below. Please note
> that
> > > this
> > > > > is
> > > > > > a source release:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc4/
> > > > > >
> > > > > > Release artifacts are signed with the following key:
> > > > > > *https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
> > > > > > <https://home.apache.org/~darinj/gpg/2AAE9E3F.asc>*
> > > > > >
> > > > > > **Please note that the release tar ball does not include the
> > gradlew
> > > > > script
> > > > > > to build. You need to install gradle in order to build.**
> > > > > >
> > > > > > Please try out the release candidate and vote. The vote is open
> > for a
> > > > > > minimum of 3 business days (Friday May 27) or until the necessary
> > > > number
> > > > > > of votes (3 binding +1s)
> > > > > > is reached.
> > > > > >
> > > > > > If/when this vote succeeds, I will call for a vote with IPMC
> > seeking
> > > > > > permission to release RC3 as Apache Myriad 0.2.0 (incubating).
> > > > > >
> > > > > > [ ] +1 Release this package as Apache Myriad 0.2.0-incubating
> > > > > > [ ]  0 I don't feel strongly about it, but I'm okay with the
> > release
> > > > > > [ ] -1 Do not release this package because...
> > > > > >
> > > > > > Thanks,
> > > > > > Darin
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Brandon
> >
>

Re: Myriad hangout tomorrow?

2016-06-14 Thread Darin Johnson

I'm planning on calling in.

Darin

On Tue, Jun 14, 2016 at 4:25 PM, Swapnil Daingade <
swapnil.daing...@gmail.com> wrote:

> Hi All,
>
> Was wondering if we have a Myriad hangout tomorrow.
>
> Regards
> Swapnil
>

Re: [Vote] Release apache-myriad-0.2.0-incubating (release candidate 4)

2016-06-02 Thread Darin Johnson

Hey all I need one more committer vote for RC4.  John and I have a lot of
other improvements we want to start working on but are waiting to cut a
stable release first.

Darin

On Mon, May 30, 2016 at 4:33 PM, sarjeet singh <sarje...@usc.edu> wrote:

> +1 (Non-binding)
>
> Verified md5 and sha512 checksums.
> D/L myriad-0.2.0-incubating-rc4.tar.gz, Compiled & deployed it on a 1 node
> MapR cluster.
> Tried FGS/CGS flex up/down, and ran long/short running M/R jobs.
> Tried framework shutdown from UI/API, and tried re-launching myriad again.
> Tried Cgroups and able to launch NMs w/ cgroups enabled successfully.
>
> - Sarjeet Singh
>
> On Fri, May 27, 2016 at 3:36 PM, Santosh Marella <smare...@maprtech.com>
> wrote:
>
> > +1 (Binding).
> >
> > - Verified signature
> > - Verified MD5 and SHA512 hashes
> > - Builds from source tar ball.
> > - Ran Apache RAT. Verified that all the sources have license headers.
> > - Verified CGS/FGS behaviors with MapReduce jobs on a 4 node Mesos/Yarn
> > cluster.
> >
> > Thanks,
> > Santosh
> >
> > On Tue, May 24, 2016 at 7:46 PM, Darin Johnson <dbjohnson1...@gmail.com>
> > wrote:
> >
> > > I'm voting +1 (Binding)
> > >
> > > Verified md5/sha hashes.  Compiled with gradle build, gradle
> > buildRMDocker
> > > (on OSX with docker-machine).
> > >
> > > Ran remote distribution (with cgroups) on a 4 node cluster (Ubuntu,
> > > hadoop-2.6.0, hadoop 2.7.0) with one CGS NM and 3 FGS NM.  Ran 8
> > > simultaneous jobs.  Shut down Framework.  Restarted NodeManager, ran an
> > > additional 3 jobs.
> > >
> > > Ran the same with docker (minus cgroups).
> > >
> > > Darin
> > >
> > > On Tue, May 24, 2016 at 10:40 PM, Darin Johnson <
> dbjohnson1...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have created a source tar ball for Apache Myriad 0.2.0-incubating,
> > > > release candidate 3 based off the feed back received from release
> > > > candidate 1,2 & 3.  Thanks Sarjeet for a very thorough review!
> > > >
> > > > Here’s the release notes:
> > > > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> > > >
> > > > The commit to be voted upon is tagged with
> > "myriad-0.2.0-incubating-rc4"
> > > > and is available here:
> > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc
> > > > <
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc4
> > > >
> > > > 4
> > > >
> > > > The artifacts to be voted upon are located below. Please note that
> this
> > > is
> > > > a source release:
> > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc4/
> > > >
> > > > Release artifacts are signed with the following key:
> > > > *https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
> > > > <https://home.apache.org/~darinj/gpg/2AAE9E3F.asc>*
> > > >
> > > > **Please note that the release tar ball does not include the gradlew
> > > script
> > > > to build. You need to install gradle in order to build.**
> > > >
> > > > Please try out the release candidate and vote. The vote is open for a
> > > > minimum of 3 business days (Friday May 27) or until the necessary
> > number
> > > > of votes (3 binding +1s)
> > > > is reached.
> > > >
> > > > If/when this vote succeeds, I will call for a vote with IPMC seeking
> > > > permission to release RC3 as Apache Myriad 0.2.0 (incubating).
> > > >
> > > > [ ] +1 Release this package as Apache Myriad 0.2.0-incubating
> > > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > > [ ] -1 Do not release this package because...
> > > >
> > > > Thanks,
> > > > Darin
> > > >
> > >
> >
>

Re: Podling Report Reminder - June 2016

2016-06-01 Thread Darin Johnson

Thanks Adam, I also was unable to edit the wiki (tried to add Santosh's
report, before I saw Adam did).

On Wed, Jun 1, 2016 at 8:07 PM, Adam Bordelon <a...@mesosphere.io> wrote:

> Updated the wiki. Looks great. Thanks Santosh!
> https://wiki.apache.org/incubator/June2016
>
> On Wed, Jun 1, 2016 at 2:50 PM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > Santosh looks good thanks!
> > On Jun 1, 2016 5:35 PM, "Santosh Marella" <smare...@maprtech.com> wrote:
> >
> > > Hi Adam,
> > >
> > >   I have put together the following report. Can you please review it
> and
> > > add it to incubator wiki (I don't have permissions)?
> > >
> > > Thanks,
> > > Santosh.
> > >
> > >
> > >
> >
> 
> > > Myriad has been incubating since 2015-03-01.
> > >
> > > Three most important issues to address in the move towards graduation:
> > >
> > >   1. Develop project roadmap for longer term community/user engagement.
> > >   2. Release frequently - 0.2.0 is underway, but has taken ~6 months
> > since
> > > last release.
> > >   3. Expand community - users/contributors/committers.
> > >
> > > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> > > aware of?
> > >
> > >   None.
> > >
> > > How has the community developed since the last report?
> > >
> > >   - dev@ mailing list experienced a low in March, but picked up
> traffic
> > > leading up to 0.2.0 release. 141 messages since the last report.
> > >   - 5 new members on the dev@ mailing list. 2 new contributors.
> > >   - Myriad was presented at ApacheCon Vancouver and at couple of other
> > > meetups. Talks submitted at various conferences.
> > >   - Bi-weekly dev syncs happening steadily. Approx. 4-7 members
> > > participate. Minutes at http://s.apache.org/8kF
> > >
> > > How has the project developed since the last report?
> > >
> > >   - Myriad 0.2.0 out for PPMC voting. DarinJ is driving the release.
> > >   - 12 commits since 4/1.
> > >   - 13 JIRAs fixed/resolved.
> > >
> > > Date of last release:
> > >
> > >   2015-12-09 myriad-0.1.0-incubating released
> > >
> > > When were the last committers or PMC members elected?
> > >
> > >   2015-10-05 Darin J
> > >   2015-10-14 Swapnil Daingade
> > >
> > >
> > > Signed-off-by:
> > >
> > >   [ ](myriad) Benjamin Hindman
> > >   [ ](myriad) Danese Cooper
> > >   [ ](myriad) Ted Dunning
> > >   [ ](myriad) Luciano Resende
> > >
> > >
> > >
> >
> --
> > >
> > > On Fri, May 27, 2016 at 11:55 PM, Adam Bordelon <a...@mesosphere.io>
> > > wrote:
> > >
> > > > I'll be pretty busy too, but if I'm not too delirious after my
> MesosCon
> > > > presentation on June 4th, I should be able to spend 30min putting
> > > something
> > > > together before EoD. If anybody else wants to draft a response, I'd
> be
> > > > happy to review it and add it to the Incubator wiki (if you don't
> have
> > > > permissions yourself).
> > > > The (real) link for our June report is:
> > > > http://wiki.apache.org/incubator/June2016
> > > > For previous Myriad reports, see:
> > > > http://wiki.apache.org/incubator/March2016
> > > > http://wiki.apache.org/incubator/December2015
> > > >
> > > >
> > > > On Fri, May 27, 2016 at 4:32 PM, Darin Johnson <
> > dbjohnson1...@gmail.com>
> > > > wrote:
> > > >
> > > > > I just saw this, I'm unfortunately going to be super busy until
> June
> > 4
> > > > and
> > > > > don't have the experience.  If someone else can handle this it'd be
> > > > great,
> > > > > if I get a copy I'll take a stab at the next one.
> > > > > On May 26, 2016 8:40 PM, <johndam...@apache.org> wrote:
> > > > >
> > > > > Dear podling,
> > > > >
> > > > > This email was sent by an automated system on behalf of the Apache
> > > > > Incubator PMC. It is an initial reminder to give you plenty of time
> > to
> > > >

Re: Myraid Slack

2016-06-22 Thread Darin Johnson

Sam,

I don't believe so.  But we do have an IRC channel #myriad on FreeNode.  I
know the mesosphere guys set up slackbots to interact with it.  I'm only
there occasionally or by appointment. I did notice Kudu now uses slack, so
maybe slack makes more sense than IRC these days, or Gitter Chat.

Darin

On Wed, Jun 22, 2016 at 1:55 AM, Sam Chen  wrote:

> Guys,
> Do we have Slack for Myraid?
>
> Regards ,
> Sam
>
> Sent from my iPhone
>
>

[RESULT] [VOTE] Release Apache Myriad 0.2.0 (incubating)

2016-06-20 Thread Darin Johnson

The vote passed with 3 +1 binding votes from IPMC members and no -1s.

+1 binding votes:
Justin Mclean
Drew Farris
John Ament

We will proceed with the post release activities:
  - Make the release artifacts available from [1] and [2]
  - github tag with "myriad-0.2.0-incubating"
  - Close the "myriad-0.2.0" release in JIRA.
  - Announce the release on Myriad's website with a blog post.

1. https://dist.apache.org/repos/dist/release/incubator/myriad/
2. http://myriad.incubator.apache.org/downloads/

Re: Myriad Vagrant Setup Issue

2016-01-15 Thread Darin Johnson

Hey Matt, if you look at the mesos ui is there any information in the
stderr or stdout of the Slave Host it's staging on?

Darin

On Fri, Jan 15, 2016 at 10:36 AM, Matthew J. Loppatto <
mloppa...@keywcorp.com> wrote:

> I've gotten a little farther on this issue by increasing the mesos slave
> memory to 4 GB from 2GB.  The node manager task get launched and sits in
> the STAGING state for a minute and then the mesos-slave.INFO log shows:
>
> I0115 15:19:12.114537 30903 slave.cpp:3841] Terminating executor
> myriad_executor20160115-145750-344821002-5050-30838-20160115-145750-344821002-5050-30838-O18020160115-145750-344821002-5050-30838-S0
> of framework 20160115-145750-344821002-5050-30838- because it did not
> register within 1mins
>
> I then increased the mesos slave's executor_registration_timeout setting
> from 1mins to 5mins to see if that would make a difference but still get
> the following in the log:
>
> I0115 15:19:12.114537 30903 slave.cpp:3841] Terminating executor
> myriad_executor20160115-145750-344821002-5050-30838-20160115-145750-344821002-5050-30838-O18020160115-145750-344821002-5050-30838-S0
> of framework 20160115-145750-344821002-5050-30838- because it did not
> register within 5mins
>
> Is there any guidance on why the Myriad executor fails to register with
> the Mesos slave?
>
> Thanks,
> Matt
>
> -Original Message-
> From: Matthew J. Loppatto
> Sent: Thursday, January 14, 2016 2:25 PM
> To: 'dev@myriad.incubator.apache.org'
> Subject: RE: Myriad Vagrant Setup Issue
>
> Sarjeet,
>
> Thanks for the reply.  I modified the medium profile in my
> myriad-config-default.yml file to use 1 cpu and 1024 MB mem and am seeing a
> similar issue in the YARN resource manager log:
>
> Offer not sufficient for task with, cpu: 1.4, memory: 2432.0, ports: 1001
>
> If I try lowering the medium profile memory below 1024 I get the following
> message in the log:
>
> NodeManager from vagrant-ubuntu-trusty-64 doesn’t satisfy minimum
> allocations, Sending SHUTDOWN signal to NodeManager.
>
> Increasing the memory of the VM to 6 GB also didn't solve the issue.  Are
> there any other measures I can take to resolve the insufficient resource
> messages?
>
> Thanks,
> Matt
>
> -Original Message-
> From: sarjeet singh [mailto:sarje...@usc.edu]
> Sent: Thursday, January 14, 2016 12:41 PM
> To: dev@myriad.incubator.apache.org
> Subject: Re: Myriad Vagrant Setup Issue
>
> Matthew,
>
> You can modify profile configurations for Nodemanagers in
> myriad-config-default.yml and reduce medium (default) NM configuration to
> match with your VM capacity so a default NM (medium profile) could launch
> without any issue.
>
> - Sarjeet Singh
>
> On Thu, Jan 14, 2016 at 10:56 PM, Matthew J. Loppatto <
> mloppa...@keywcorp.com> wrote:
>
> > Hi,
> >
> > I'm trying to setup Myriad for an R project at my company but I'm
> > having some trouble even getting the Vagrant VM working properly.  I
> > followed the instructions here:
> >
> > https://github.com/apache/incubator-myriad/blob/master/docs/vagrant.md
> >
> > with some minor corrections but the Node Manager fails to start.  It
> > looks like a resource issue based on the log output.  The Mesos UI
> > shows a slave process with 2 cpu and 2 GB mem, but the log states the
> > task requires 4 cpu and 5.5 GB mem.
> >
> > I've detailed my configuration and log output in this public Gist:
> >
> > https://gist.github.com/FearTheParrot/626259c23a854645fcbf
> >
> > Would it be possible to provision the Mesos slave with more resources
> > while also reducing the profile size of the Node Manager?  The Vagrant
> > VM only has 4 GB ram and 2 cpu.
> >
> > Any help would be appreciated.
> >
> > Thanks!
> > Matt
> >
>

Re: Myriad Vagrant Setup Issue

2016-01-15 Thread Darin Johnson

Matt, if you can't access the UI, on the slave you should still be able to
access stderr and stdout going to:

/tmp/mesos/slaves//frameworks//executors/myriad_executor/runs/latest/stderr

/tmp/mesos/slaves//frameworks//executors/myriad_executor/runs/latest/stdout
Replace /tmp/mesos/ with your workdir (likely /var/run/mesos/ or
/tmp/mesos).  The error messages here are usually informative.

On Fri, Jan 15, 2016 at 11:13 AM, Matthew J. Loppatto <
mloppa...@keywcorp.com> wrote:

> Hey Darin,
>
> For some reason my Mesos UI hangs when loading the logs but I posted the
> contents of my mesos slave logs in /var/log/mesos to this public Gist:
> https://gist.github.com/FearTheParrot/b00aa7eee9ae169498d3
>
> Matt
>
> -----Original Message-
> From: Darin Johnson [mailto:dbjohnson1...@gmail.com]
> Sent: Friday, January 15, 2016 10:55 AM
> To: Dev
> Subject: Re: Myriad Vagrant Setup Issue
>
> Hey Matt, if you look at the mesos ui is there any information in the
> stderr or stdout of the Slave Host it's staging on?
>
> Darin
>
> On Fri, Jan 15, 2016 at 10:36 AM, Matthew J. Loppatto <
> mloppa...@keywcorp.com> wrote:
>
> > I've gotten a little farther on this issue by increasing the mesos
> > slave memory to 4 GB from 2GB.  The node manager task get launched and
> > sits in the STAGING state for a minute and then the mesos-slave.INFO log
> shows:
> >
> > I0115 15:19:12.114537 30903 slave.cpp:3841] Terminating executor
> > myriad_executor20160115-145750-344821002-5050-30838-20160115-14575
> > 0-344821002-5050-30838-O18020160115-145750-344821002-5050-30838-S0
> > of framework 20160115-145750-344821002-5050-30838- because it did
> > not register within 1mins
> >
> > I then increased the mesos slave's executor_registration_timeout
> > setting from 1mins to 5mins to see if that would make a difference but
> > still get the following in the log:
> >
> > I0115 15:19:12.114537 30903 slave.cpp:3841] Terminating executor
> > myriad_executor20160115-145750-344821002-5050-30838-20160115-14575
> > 0-344821002-5050-30838-O18020160115-145750-344821002-5050-30838-S0
> > of framework 20160115-145750-344821002-5050-30838- because it did
> > not register within 5mins
> >
> > Is there any guidance on why the Myriad executor fails to register
> > with the Mesos slave?
> >
> > Thanks,
> > Matt
> >
> > -Original Message-
> > From: Matthew J. Loppatto
> > Sent: Thursday, January 14, 2016 2:25 PM
> > To: 'dev@myriad.incubator.apache.org'
> > Subject: RE: Myriad Vagrant Setup Issue
> >
> > Sarjeet,
> >
> > Thanks for the reply.  I modified the medium profile in my
> > myriad-config-default.yml file to use 1 cpu and 1024 MB mem and am
> > seeing a similar issue in the YARN resource manager log:
> >
> > Offer not sufficient for task with, cpu: 1.4, memory: 2432.0, ports:
> > 1001
> >
> > If I try lowering the medium profile memory below 1024 I get the
> > following message in the log:
> >
> > NodeManager from vagrant-ubuntu-trusty-64 doesn’t satisfy minimum
> > allocations, Sending SHUTDOWN signal to NodeManager.
> >
> > Increasing the memory of the VM to 6 GB also didn't solve the issue.
> > Are there any other measures I can take to resolve the insufficient
> > resource messages?
> >
> > Thanks,
> > Matt
> >
> > -Original Message-
> > From: sarjeet singh [mailto:sarje...@usc.edu]
> > Sent: Thursday, January 14, 2016 12:41 PM
> > To: dev@myriad.incubator.apache.org
> > Subject: Re: Myriad Vagrant Setup Issue
> >
> > Matthew,
> >
> > You can modify profile configurations for Nodemanagers in
> > myriad-config-default.yml and reduce medium (default) NM configuration
> > to match with your VM capacity so a default NM (medium profile) could
> > launch without any issue.
> >
> > - Sarjeet Singh
> >
> > On Thu, Jan 14, 2016 at 10:56 PM, Matthew J. Loppatto <
> > mloppa...@keywcorp.com> wrote:
> >
> > > Hi,
> > >
> > > I'm trying to setup Myriad for an R project at my company but I'm
> > > having some trouble even getting the Vagrant VM working properly.  I
> > > followed the instructions here:
> > >
> > > https://github.com/apache/incubator-myriad/blob/master/docs/vagrant.
> > > md
> > >
> > > with some minor corrections but the Node Manager fails to start.  It
> > > looks like a resource issue based on the log output.  The Mesos UI
> > > shows a slave process with 2 cpu and 2 GB mem, but the log states
> > > the task requires 4 cpu and 5.5 GB mem.
> > >
> > > I've detailed my configuration and log output in this public Gist:
> > >
> > > https://gist.github.com/FearTheParrot/626259c23a854645fcbf
> > >
> > > Would it be possible to provision the Mesos slave with more
> > > resources while also reducing the profile size of the Node Manager?
> > > The Vagrant VM only has 4 GB ram and 2 cpu.
> > >
> > > Any help would be appreciated.
> > >
> > > Thanks!
> > > Matt
> > >
> >
>

0.2.0 release

2016-03-15 Thread Darin Johnson

We've talked about a 0.2.0 release slated for mid April at the dev sync.
I'd like to nail down any features people would like and have time to work
on.

I've been spend some time fixing major bugs to the FGS feature and plan to
address MYRIAD-136 and MYRIAD-189.

I'd also be willing to be the release manager on this release if necessary.

Darin

Re: NM does not start with cgroups enabled

2016-03-15 Thread Darin Johnson

Hey Bjorn,

Can you copy paste the relevant part of the Myriad and yarn-site.xml?
Also, can you ensure you are running the mesos-slave with
--isolation="cpu/cgroups,memory/cgroups?.

I'll try to recreate the problem and/or tell you what's missing in the
config.

Darin

On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier 
wrote:

> Hi all,
>
> I have trouble starting the NM on the slave nodes. Apparently, it does
> not find it's configuration or sth. is wrong with the configuration.
>
> With cgroups enabled, the NM does not start, the logs contain,
> indicating that there is sth. wrong in the configuratin. However,
> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> value used to be "${yarn.nodemanager.linux-container-executor.group}" as
> indicated by the installation documentation, however I'm uncertain
> whether this recursion is the correct approach.
>
>
> ==
> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> Caused by: java.io.IOException: Linux container executor not configured
> properly (error=24)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> ... 3 more
> Caused by: ExitCodeException exitCode=24: Can't get configured value for
> yarn.nodemanager.linux-container-executor.group.
>
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> at org.apache.hadoop.util.Shell.run(Shell.java:460)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
> ... 4 more
> ==
>
>
> I have given it another try with cgroups disabled (in
> myriad-config-default.yml), I seem to get a little further, but still
> stuck at running Yarn jobs:
>
> ==
> 16/03/14 10:56:34 INFO container.Container: Container
> container_1457949199710_0001_01_01 transitioned from LOCALIZED to
> RUNNING
> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
> launchContainer: [bash,
>
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
> from container container_1457949199710_0001_01_01 is : 1
> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
> from container-launch with container ID:
> container_1457949199710_0001_01_01 and exit code: 1
> ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> at org.apache.hadoop.util.Shell.run(Shell.java:460)
> at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> at
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
> container-launch.
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
> container_1457949199710_0001_01_01
> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
> ==
>
> Unfortunately, directory
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
> is empty, the log indicates that it is being deleted after the failed
> attempt.
>
> Again, any hint would be useful. Also regarding the activation of cgroups.
>
>
> Best regards,
> Björn
>
> --
> Dipl.-Inform. Björn Hagemeier
>

Re: 0.2.0 release

2016-03-19 Thread Darin Johnson

Happy to report as of the last two PRS, FGS is usable no memory leaks or
crashes, could likely be improved with fancier schedulers but that's for
the future.  I'm currently looking at running some terasort benchmarks with
FGS and a reserved resources vs statically sized NMs to figure out the
performance hit.  Might be worth a blog post in the near future.

Adam, I've been looking through the cgroups code for myriad recently,
apparently we need the mod the path YARN uses for it's Hierarchy.  Does
that change at all within a Docker container or is it the same?

Darin

On Mar 16, 2016 8:48 PM, "Adam Bordelon" <a...@mesosphere.io> wrote:

> +1 on Darin as release manager
>
> I'd like to see 0.2 have:
> - Usable FGS
> - Dockerized NM (for multitenancy)
>
> On Tue, Mar 15, 2016 at 9:46 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > We've talked about a 0.2.0 release slated for mid April at the dev sync.
> > I'd like to nail down any features people would like and have time to
> work
> > on.
> >
> > I've been spend some time fixing major bugs to the FGS feature and plan
> to
> > address MYRIAD-136 and MYRIAD-189.
> >
> > I'd also be willing to be the release manager on this release if
> necessary.
> >
> > Darin
> >
>

Re: NM does not start with cgroups enabled

2016-03-16 Thread Darin Johnson

what does your container-executor.cfg look like?  Seems like
yarn.nodemanager.linux-container-executor.group isn't set, or possibly
bannerusers= hasn't been set (some distro's).

On Tue, Mar 15, 2016 at 12:52 PM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:

> Bjorn,
>
> You're isolation configuration is correct, I was going from memory.  I'll
> take a look at you're configs a little later on my test environment and see
> what I can come up with.
>
> Darin
>
> On Tue, Mar 15, 2016 at 12:07 PM, Björn Hagemeier <
> b.hageme...@fz-juelich.de> wrote:
>
>> Dear Darin,
>>
>> thanks for your response.
>>
>> The precise content of /etc/mesos-slave/isolation is:
>>
>> ==
>> cgroups/cpu,cgroups/mem
>> ==
>>
>> Which I took from some documentation, it may have been that of the
>> Puppet module I'm using [1]. Should the values be different? Your string
>> looks a bit different: "cpu/cgroups,memory/cgroups".
>>
>> Please find my yarn-site.xml and myriad-config-default.yml attached. I
>> don't think they contain any sensitive information.
>>
>>
>> Best regards,
>> Björn
>>
>> [1] https://github.com/deric/puppet-mesos
>>
>> Am 15.03.2016 um 16:46 schrieb Darin Johnson:
>> > Hey Bjorn,
>> >
>> > Can you copy paste the relevant part of the Myriad and yarn-site.xml?
>> > Also, can you ensure you are running the mesos-slave with
>> > --isolation="cpu/cgroups,memory/cgroups?.
>> >
>> > I'll try to recreate the problem and/or tell you what's missing in the
>> > config.
>> >
>> > Darin
>> >
>> > On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier <
>> b.hageme...@fz-juelich.de>
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> I have trouble starting the NM on the slave nodes. Apparently, it does
>> >> not find it's configuration or sth. is wrong with the configuration.
>> >>
>> >> With cgroups enabled, the NM does not start, the logs contain,
>> >> indicating that there is sth. wrong in the configuratin. However,
>> >> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
>> >> value used to be "${yarn.nodemanager.linux-container-executor.group}"
>> as
>> >> indicated by the installation documentation, however I'm uncertain
>> >> whether this recursion is the correct approach.
>> >>
>> >>
>> >> ==
>> >> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
>> NodeManager
>> >> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
>> >> initialize container executor
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
>> >> at
>> >>
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
>> >> Caused by: java.io.IOException: Linux container executor not configured
>> >> properly (error=24)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
>> >> ... 3 more
>> >> Caused by: ExitCodeException exitCode=24: Can't get configured value
>> for
>> >> yarn.nodemanager.linux-container-executor.group.
>> >>
>> >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> >> at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> >> at
>> >>
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
>> >> ... 4 more

Myriad 0.1.1 Release

2016-04-06 Thread Darin Johnson

Hi,

I'm the release manager for Myriad 0.1.1, which we're hoping to get out in
the next couple weeks. Here's the list of PR's and JIRA's that I think
should go into the release since 0.1.0:

Complete:
#56  MYRIAD-181 Build failure due to dependency on zookeeper test jar

#57 MYRIAD-153: tasks not finishing when FGS is enabled
#60 MYRIAD-186 Clean up the build

#62 Myriad 188 - NodeManager switch to UNHEALTHY causes NPE

#63 MYRIAD-171, MYRIAD-190
, compatibility issues
with 2.7.1+ and 2.6.2+

Todo:
Myriad-192

Possibly (Pending determination of issue):
Myriad-194
Myriad-191

This is certainly open for discussion so if you think something should be
added or removed please respond.  This is a fix release so no new features
are to be added.  However, we plan to release 0.2.0 shortly after so new
features can be added then.

Darin

Re: Myriad 0.1.1 Release

2016-04-08 Thread Darin Johnson

Thanks Adam, I think 194 was a typo I believe I meant 195 (node manager
dieing).  Currently I haven't noticed the behavior so it will depend on
root cause.
On Apr 8, 2016 4:51 AM, "Adam Bordelon" <a...@mesosphere.io> wrote:

Thanks for heading this up Darin!

Here's a link to the JIRA Issues page for Myriad 0.1.1:
https://issues.apache.org/jira/browse/MYRIAD/fixforversion/12335455/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel

Looking at MYRIAD-194, it seems to be a new feature request, and only
bugfixes should go into a patch release like 0.1.1, so I vote against
its inclusion. MYRIAD-191 could be worth including, depending on the
root-cause.

On Wed, Apr 6, 2016 at 11:06 AM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:
> Hi,
>
> I'm the release manager for Myriad 0.1.1, which we're hoping to get out in
> the next couple weeks. Here's the list of PR's and JIRA's that I think
> should go into the release since 0.1.0:
>
> Complete:
> #56  MYRIAD-181 Build failure due to dependency on zookeeper test jar
> <https://github.com/apache/incubator-myriad/pull/56>
> #57 MYRIAD-153: tasks not finishing when FGS is enabled
> #60 MYRIAD-186 Clean up the build
> <https://github.com/apache/incubator-myriad/pull/60>
> #62 Myriad 188 - NodeManager switch to UNHEALTHY causes NPE
> <https://github.com/apache/incubator-myriad/pull/62>
> #63 MYRIAD-171, MYRIAD-190
> <https://github.com/apache/incubator-myriad/pull/63>, compatibility issues
> with 2.7.1+ and 2.6.2+
>
> Todo:
> Myriad-192
>
> Possibly (Pending determination of issue):
> Myriad-194
> Myriad-191
>
> This is certainly open for discussion so if you think something should be
> added or removed please respond.  This is a fix release so no new features
> are to be added.  However, we plan to release 0.2.0 shortly after so new
> features can be added then.
>
> Darin

Re: NM does not start with cgroups enabled

2016-03-20 Thread Darin Johnson

Hey Bjorn,

I think I figured out the issue.  Some of the values for cgroups are still
hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an update
for 0.2.0.  I'll also respond to this thread after a pull request is
submitted in case you'd like to test it.

Darin
Hi all,

I have trouble starting the NM on the slave nodes. Apparently, it does
not find it's configuration or sth. is wrong with the configuration.

With cgroups enabled, the NM does not start, the logs contain,
indicating that there is sth. wrong in the configuratin. However,
yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
value used to be "${yarn.nodemanager.linux-container-executor.group}" as
indicated by the installation documentation, however I'm uncertain
whether this recursion is the correct approach.


==
16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
Caused by: java.io.IOException: Linux container executor not configured
properly (error=24)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
... 3 more
Caused by: ExitCodeException exitCode=24: Can't get configured value for
yarn.nodemanager.linux-container-executor.group.

at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
... 4 more
==


I have given it another try with cgroups disabled (in
myriad-config-default.yml), I seem to get a little further, but still
stuck at running Yarn jobs:

==
16/03/14 10:56:34 INFO container.Container: Container
container_1457949199710_0001_01_01 transitioned from LOCALIZED to
RUNNING
16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
launchContainer: [bash,
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
from container container_1457949199710_0001_01_01 is : 1
16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
from container-launch with container ID:
container_1457949199710_0001_01_01 and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
at org.apache.hadoop.util.Shell.run(Shell.java:460)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
container-launch.
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
container_1457949199710_0001_01_01
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
==

Unfortunately, directory
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
is empty, the log indicates that it is being deleted after the failed
attempt.

Again, any hint would be useful. Also regarding the activation of cgroups.


Best regards,
Björn

--
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hageme...@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute

Re: NM does not start with cgroups enabled

2016-03-23 Thread Darin Johnson

Hey, Bjorn sorry for the delay, looking at the difference between the
exceptions and my own experience I believe you left some cgroup configs in
yarn-site.xml of the node manager.
On Mar 18, 2016 2:58 AM, "Björn Hagemeier" <b.hageme...@fz-juelich.de>
wrote:

> Hi Darin,
>
> thanks a lot for this. But what about the other case below, when cgroups
> is disabled?
>
>
> Björn
>
> Am 18.03.2016 um 00:25 schrieb Darin Johnson:
> > Hey Bjorn,
> >
> > I think I figured out the issue.  Some of the values for cgroups are
> still
> > hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an
> update
> > for 0.2.0.  I'll also respond to this thread after a pull request is
> > submitted in case you'd like to test it.
> >
> > Darin
> > Hi all,
> >
> > I have trouble starting the NM on the slave nodes. Apparently, it does
> > not find it's configuration or sth. is wrong with the configuration.
> >
> > With cgroups enabled, the NM does not start, the logs contain,
> > indicating that there is sth. wrong in the configuratin. However,
> > yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> > value used to be "${yarn.nodemanager.linux-container-executor.group}" as
> > indicated by the installation documentation, however I'm uncertain
> > whether this recursion is the correct approach.
> >
> >
> > ==
> > 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
> NodeManager
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> > initialize container executor
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> > at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> > Caused by: java.io.IOException: Linux container executor not configured
> > properly (error=24)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> > ... 3 more
> > Caused by: ExitCodeException exitCode=24: Can't get configured value for
> > yarn.nodemanager.linux-container-executor.group.
> >
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
> > ... 4 more
> > ==
> >
> >
> > I have given it another try with cgroups disabled (in
> > myriad-config-default.yml), I seem to get a little further, but still
> > stuck at running Yarn jobs:
> >
> > ==
> > 16/03/14 10:56:34 INFO container.Container: Container
> > container_1457949199710_0001_01_01 transitioned from LOCALIZED to
> > RUNNING
> > 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
> > launchContainer: [bash,
> >
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
> > from container container_1457949199710_0001_01_01 is : 1
> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
> > from container-launch with container ID:
> > container_1457949199710_0001_01_01 and exit code: 1
> > ExitCodeException exitCode=1:
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> >

Re: 0.2.0 release

2016-03-23 Thread Darin Johnson

Swanil,

I concur and want to keep both options for Mesos and Docker networking
available, and putting the configuration for both in should be a priority.
However, one has to be careful with this as the NM's register with the RM
via heartbeats with their container port (Not the host port), this isn't an
issue if NM and RM are in the same Docker Network, via Weave or Kubernetes
but is with simple bridged networking. We also have to be careful as Myriad
currently doesn't run HDFS itself so we'd lose data locality.  My idea was
the start with Host Networking so we could make Myriad easier to deploy but
leave room to add additional networking options: basically exposing all the
protobuf options for Docker Parameters (used to configure docker
networking) and NetworkInfo (used to configure Mesos networking).

Darin

On Tue, Mar 22, 2016 at 2:48 PM, Swapnil Daingade <sdaing...@maprtech.com>
wrote:

> Hi Darin,
>
> I feel docker networking is something we should spent time to think
> through.
> A user should be able to use multiple options provided by Mesos, Docker,
> 3rd party etc
>
> It would be great if we can abstract the specific implementation to provide
> container ip addresses behind interfaces. User should be able to switch
> implementations by making simple changes in configuration files.
>
> Regards
> Swapnil
>
>
> On Tue, Mar 22, 2016 at 8:20 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > Swapnil,
> >
> > Any help would be appreciated.  I'll try to write up what I'm working on
> > tomorrow.  But essentially the ideas are:
> > 1. Ability to launch the resource manager and node managers in docker
> > containers
> > 2. Use host networking for now (Ports configured to be pulled from mesos
> -
> > ability to use ports reserved by role), but leave hooks to easily add IP
> > per container.
> > 3. Ability to get configuration files for a URI
> > 4. Ability to mount local volumes for local directories in the shuffle
> > phase etc (though will require more config).
> >
> > Darin
> >
>

Re: Myriad talk link for MesosCon?

2016-03-23 Thread Darin Johnson

Yeah I didn't see one either.

Darin

On Wed, Mar 23, 2016 at 1:10 PM, Sarjeet Singh 
wrote:

> I couldn't find any associated link of myriad talk for MesosCon voting.
> Anyone?
>
> Though, I found these proposal doc:
>
> Developers: http://bit.ly/1RpZPvj
> Users: http://bit.ly/1Mspaxp
>
>
> *It seems the deadline for the proposal voting is today, March 23 2016.*
>
> -Sarjeet
>

Re: Challenges after MapR 5.1 Upgrade.

2016-04-04 Thread Darin Johnson

Hey John,

I noticed these lines in your yarn-site.xml:


yarn.scheduler.minimum-allocation-mb
512




yarn.scheduler.minimum-allocation-vcores

1


If your attempting to launch a zero resource nodemanager for fgs that will
result in the first stack trace.  Both should be explicitly 0 for that
feature to work (defaults are 1024 and 1 resp, which will fail).  You do
have them set below to 0, however I'm in certain which would take
precedence.
On Apr 4, 2016 5:19 PM, "John Omernik"  wrote:

> This was a Upgrade from 5.0.  I will post here, note: I have removed the
> mapr_shuffle to get node managers to work, however, I am seeing other odd
> things, so any help would be appreciated.
>
> 
> 
>
> 
> 
> yarn.nodemanager.aux-services
> mapreduce_shuffle,myriad_executor
> 
> 
> 
> yarn.resourcemanager.hostname
> myriadprod.marathonprod.mesos
> 
> 
> yarn.nodemanager.aux-services.mapreduce_shuffle.class
> org.apache.hadoop.mapred.ShuffleHandler
> 
> 
> yarn.nodemanager.aux-services.myriad_executor.class
> org.apache.myriad.executor.MyriadExecutorAuxService
> 
> 
> yarn.nm.liveness-monitor.expiry-interval-ms
> 2000
> 
> 
> yarn.am.liveness-monitor.expiry-interval-ms
> 1
> 
> 
> yarn.resourcemanager.nm.liveness-monitor.interval-ms
> 1000
> 
> 
> 
> yarn.nodemanager.resource.cpu-vcores
> ${nodemanager.resource.cpu-vcores}
> 
> 
> yarn.nodemanager.resource.memory-mb
> ${nodemanager.resource.memory-mb}
> 
>
> 
> 
> yarn.scheduler.minimum-allocation-mb
> 512
> 
>
> 
> yarn.scheduler.minimum-allocation-vcores
> 1
> 
>
>
> 
>   
>
> yarn.nodemanager.address
> ${myriad.yarn.nodemanager.address}
> 
> 
> yarn.nodemanager.webapp.address
> ${myriad.yarn.nodemanager.webapp.address}
> 
> 
> yarn.nodemanager.webapp.https.address
> ${myriad.yarn.nodemanager.webapp.address}
> 
> 
> yarn.nodemanager.localizer.address
> ${myriad.yarn.nodemanager.localizer.address}
> 
>
> 
> 
> yarn.resourcemanager.scheduler.class
> org.apache.myriad.scheduler.yarn.MyriadFairScheduler
> 
>
> 
>
> yarn.scheduler.minimum-allocation-vcores
> 0
> 
> 
> yarn.scheduler.minimum-allocation-vcores
> 0
> 
> 
> 
> Who will execute(launch) the containers.
> yarn.nodemanager.container-executor.class
> ${yarn.nodemanager.container-executor.class}
> 
> 
> The class which should help the LCE handle
> resources.
>
>
> yarn.nodemanager.linux-container-executor.resources-handler.class
>
>
> ${yarn.nodemanager.linux-container-executor.resources-handler.class}
> 
> 
>
> yarn.nodemanager.linux-container-executor.cgroups.hierarchy
>
>
> ${yarn.nodemanager.linux-container-executor.cgroups.hierarchy}
> 
> 
>
> yarn.nodemanager.linux-container-executor.cgroups.mount
>
> ${yarn.nodemanager.linux-container-executor.cgroups.mount}
> 
> 
>
> yarn.nodemanager.linux-container-executor.cgroups.mount-path
>
>
> ${yarn.nodemanager.linux-container-executor.cgroups.mount-path}
> 
> 
> yarn.nodemanager.linux-container-executor.group
> ${yarn.nodemanager.linux-container-executor.group}
> 
> 
> yarn.nodemanager.linux-container-executor.path
> ${yarn.home}/bin/container-executor
> 
> 
> yarn.http.policy
> HTTP_ONLY
> 
> 
>
> On Mon, Apr 4, 2016 at 3:53 PM, yuliya Feldman  >
> wrote:
>
> > YarnDefaultProperties.java that defines class for mapr_direct_shuffle
> > should be there even in 5.0, so nothing new there even if maprfs jar is
> > outdated - could you also check that?
> > Also could you paste content of your yarn-site.xml here?
> > Thanks,Yuliya
> >
> >   From: yuliya Feldman 
> >  To: "dev@myriad.incubator.apache.org" 
> >  Sent: Monday, April 4, 2016 1:43 PM
> >  Subject: Re: Challenges after MapR 5.1 Upgrade.
> >
> > Hello John,
> > Did you upgrade to 5.1 or installed new one?
> > Feels like MapR default properties were not loaded - I need to poke
> around
> > and then I will ask you for additional info
> > Thanks,Yuliya
> >
> >   From: John Omernik 
> >  To: dev@myriad.incubator.apache.org
> >  Sent: Monday, April 4, 2016 12:29 PM
> >  Subject: Challenges after MapR 5.1 Upgrade.
> >
> > I had at one point Myriad working fine in MapR 5.0.  I updated to 5.1,
> and
> > repackaged my hadoop tgz for remote distribution and now I have two
> > problems occurring.
> >
> > 1. At first when I had the mapr direct shuffle enabled per the
> > yarn-site.xml on

Re: 0.2.0 release

2016-03-22 Thread Darin Johnson

Swapnil,

Any help would be appreciated.  I'll try to write up what I'm working on
tomorrow.  But essentially the ideas are:
1. Ability to launch the resource manager and node managers in docker
containers
2. Use host networking for now (Ports configured to be pulled from mesos -
ability to use ports reserved by role), but leave hooks to easily add IP
per container.
3. Ability to get configuration files for a URI
4. Ability to mount local volumes for local directories in the shuffle
phase etc (though will require more config).

Darin

Observations on Fine Grained Scaling

2016-04-13 Thread Darin Johnson

I've been running a number of tests on the Fine Grained scaling aspect on
Myriad.  Here's a few notes:

1. After the patches it seems stable, I'm able to run multiple terasort/pi
jobs and a few scalding jobs without difficulty.
2. Noticed with jobs with short map tasks (8-12 secs), I rarely got more
than two containers per node, I'm curious if I'm not consuming resources
fast enough.  The issue goes away on the reduce side (able to get far
better utilization of offers).  The issue can be lessened by increasing
mapred.splits.min.size and mapred.splits.max.size.  This may be an issue
for things like Hive.

Darin

Re: Myriad Releases

2016-04-23 Thread Darin Johnson

Great I'm calling it.

Also, if anyone wants to test or provide feedback on
https://github.com/apache/incubator-myriad/pull/64 that would be awesome.
It still needs work, but if you have hdfs up it takes about a half hour to
get a Myriad up, I'm working on streamlining it a bit.  Would also like to
get the base docker images hosted on something other than my personal
dockerhub.
On Thu, Apr 21, 2016 at 12:16 AM, Santosh Marella <smare...@maprtech.com>
wrote:

> Agreed. Let's just do a 0.2.0 rather than a 0.1.1.
>
> Santosh
>
> On Wed, Apr 20, 2016 at 6:28 PM, Swapnil Daingade <sdaing...@maprtech.com>
> wrote:
>
> > +1
> >
> > Another change to the roadmap was to move the security work to 0.3
> release.
> >
> > Regards
> > Swapnil
> >
> >
> >
> > On Wed, Apr 20, 2016 at 6:04 PM, Adam Bordelon <a...@mesosphere.io>
> wrote:
> >
> > > +1 to skipping 0.1.1 if 0.2.0 is coming soon enough
> > > I don't think we have any production users eagerly awaiting the 0.1.1
> > fixes
> > >
> > > On Wed, Apr 20, 2016 at 5:52 PM, Darin Johnson <
> dbjohnson1...@gmail.com>
> > > wrote:
> > >
> > > > Hey Zachary! Thanks, for the upvote.
> > > >
> > > > Also if you're looking for projects ping me! I'm going to be adding
> > more
> > > > tickets in the next few days.
> > > >
> > > > On Wed, Apr 20, 2016 at 8:38 PM, Zachary Jaffee <z...@case.edu>
> wrote:
> > > >
> > > > > If I recall the original reason for the 0.1.1 release was that it
> > would
> > > > be
> > > > > able to get it out earlier than the 0.2.0 release. Since it looks
> > like
> > > > they
> > > > > will be released at the same time essentially, the reason to
> release
> > > > 0.1.1
> > > > > over just waiting to release 0.2.0 goes away.
> > > > >
> > > > > On Wed, Apr 20, 2016 at 5:29 PM, Darin Johnson <
> > > dbjohnson1...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > During the dev sync today we discussed the upcoming 0.1.1 and
> 0.2.0
> > > > > > releases.  Currently, the only out standing issue for 0.1.1 is
> > > > MYRIAD-192
> > > > > > (Cgroups), for Myriad 0.2.0 the outstanding issues are Myriad-36
> > and
> > > > > Myriad
> > > > > > 192 (Configuration and Docker/Appc support).  Currently I have a
> > WIP
> > > PR
> > > > > for
> > > > > > Docker Support which I'd like some feedback on (it's should be
> > super
> > > > easy
> > > > > > to test), I'll probably complete Myriad 192 and part of that PR
> as
> > > > it's a
> > > > > > natural fit.  I estimate I can get all patches done by early may
> > and
> > > > > > hopefully get a release or release candidate out by May 11
> > > (ApacheCon).
> > > > > >
> > > > > > Due to the Alpha nature of Myriad and the significant value of
> > Docker
> > > > and
> > > > > > Configuration support, I think most people would opt for 0.2.0
> over
> > > > 0.1.1
> > > > > > and don't feel it's worth the effort to provide both releases at
> > this
> > > > > > time.  I suggest simply doing a 0.2.0 release.  Are there any
> > > > objections?
> > > > > >
> > > > > > Darin
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Zach Jaffee
> > > > > B.S. Computer Science
> > > > > Case Western Reserve University Class of 2017
> > > > > Operations Director | WRUW FM 91.1 Cleveland
> > > > > (917) 881-0646
> > > > > zjaffee.com
> > > > > linkedin.com/in/zjaffee
> > > > > github.com/ZJaffee
> > > > >
> > > >
> > >
> >
>

Re: [Vote] Release apache-myriad-0.2.0-incubating (release candidate 2)

2016-05-19 Thread Darin Johnson

I'm voting +1.
Build, ran multiple map/reduce jobs, a few spark and flink jobs.

Darin

On Tue, May 17, 2016 at 9:24 PM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:

> Hi All,
>
> I have created a source tar ball for Apache Myriad 0.2.0-incubating,
> release candidate 2 based off the feed back received from release
> candidate 1.  Specifically, the NOTICE file has been updated to 2016 and
> the framework properly shuts down when using the web ui.
>
> Here’s the release notes:
> https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
>
> The commit to be voted upon is tagged with "myriad-0.2.0-incubating-rc2"
> and is available here:
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc2
>
> The artifacts to be voted upon are located below. Please note that this is
> a source release:
>
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc2/
>
> Release artifacts are signed with the following key:
> *https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
> <https://home.apache.org/~darinj/gpg/2AAE9E3F.asc>*
>
> **Please note that the release tar ball does not include the gradlew script
> to build. You need to install gradle in order to build.**
>
> Please try out the release candidate and vote. The vote is open for a
> minimum of 3 business days (Friday May 20) or until the necessary number
> of votes (3 binding +1s)
> is reached.
>
> If/when this vote succeeds, I will call for a vote with IPMC seeking
> permission to release RC1 as Apache Myriad 0.2.0 (incubating).
>
> [ ] +1 Release this package as Apache Myriad 0.2.0-incubating
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
> Thanks,
> Darin
>

[Vote] Release apache-myriad-0.2.0-incubating (release candidate 3)

2016-05-23 Thread Darin Johnson

Hi All,

I have created a source tar ball for Apache Myriad 0.2.0-incubating,
release candidate 3 based off the feed back received from release candidate
1 & 2.  Specifically, this corrected some documentation and a minor typo.

Here’s the release notes:
https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes

The commit to be voted upon is tagged with "myriad-0.2.0-incubating-rc2"
and is available here:
https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc

3

The artifacts to be voted upon are located below. Please note that this is
a source release:
https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc3/

Release artifacts are signed with the following key:
*https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
*

**Please note that the release tar ball does not include the gradlew script
to build. You need to install gradle in order to build.**

Please try out the release candidate and vote. The vote is open for a
minimum of 3 business days (Friday May 27) or until the necessary number of
votes (3 binding +1s)
is reached.

If/when this vote succeeds, I will call for a vote with IPMC seeking
permission to release RC3 as Apache Myriad 0.2.0 (incubating).

[ ] +1 Release this package as Apache Myriad 0.2.0-incubating
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...

Thanks,
Darin

Re: [Vote] Release apache-myriad-0.2.0-incubating (release candidate 3)

2016-05-24 Thread Darin Johnson

That was my fault I pushed the PR to master but not to 0.2.x before I ran
the release script (off 0.2.x).  New release coming momentarily.

On Tue, May 24, 2016 at 9:21 PM, Sarjeet Singh <sarjeetsi...@maprtech.com>
wrote:

> >> Specifically, this corrected some documentation and a minor typo
>
> Darin, RC3 is missing PR#75 changes. I D/L'ed the tar and manually checked
> the changes and wasn't there.
>
> -Sarjeet
>
> On Mon, May 23, 2016 at 9:15 PM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I have created a source tar ball for Apache Myriad 0.2.0-incubating,
> > release candidate 3 based off the feed back received from release
> candidate
> > 1 & 2.  Specifically, this corrected some documentation and a minor typo.
> >
> > Here’s the release notes:
> > https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes
> >
> > The commit to be voted upon is tagged with "myriad-0.2.0-incubating-rc2"
> > and is available here:
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc
> > <
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc3
> > >
> > 3
> >
> > The artifacts to be voted upon are located below. Please note that this
> is
> > a source release:
> >
> >
> https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc3/
> >
> > Release artifacts are signed with the following key:
> > *https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
> > <https://home.apache.org/~darinj/gpg/2AAE9E3F.asc>*
> >
> > **Please note that the release tar ball does not include the gradlew
> script
> > to build. You need to install gradle in order to build.**
> >
> > Please try out the release candidate and vote. The vote is open for a
> > minimum of 3 business days (Friday May 27) or until the necessary number
> of
> > votes (3 binding +1s)
> > is reached.
> >
> > If/when this vote succeeds, I will call for a vote with IPMC seeking
> > permission to release RC3 as Apache Myriad 0.2.0 (incubating).
> >
> > [ ] +1 Release this package as Apache Myriad 0.2.0-incubating
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> > Thanks,
> > Darin
> >
>

[Vote] Release apache-myriad-0.2.0-incubating (release candidate 4)

2016-05-24 Thread Darin Johnson

Hi All,

I have created a source tar ball for Apache Myriad 0.2.0-incubating,
release candidate 3 based off the feed back received from release candidate
1,2 & 3.  Thanks Sarjeet for a very thorough review!

Here’s the release notes:
https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes

The commit to be voted upon is tagged with "myriad-0.2.0-incubating-rc4"
and is available here:
https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc

4

The artifacts to be voted upon are located below. Please note that this is
a source release:
https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc4/

Release artifacts are signed with the following key:
*https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
*

**Please note that the release tar ball does not include the gradlew script
to build. You need to install gradle in order to build.**

Please try out the release candidate and vote. The vote is open for a
minimum of 3 business days (Friday May 27) or until the necessary number of
votes (3 binding +1s)
is reached.

If/when this vote succeeds, I will call for a vote with IPMC seeking
permission to release RC3 as Apache Myriad 0.2.0 (incubating).

[ ] +1 Release this package as Apache Myriad 0.2.0-incubating
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...

Thanks,
Darin

Spark and Flink

2016-05-19 Thread Darin Johnson

Just wanted to let people know I tried running a Spark and a Flink job
today on Myriad with zero sized node managers.  It just worked!

This shouldn't be interpreted as it's not going to had issues.  However,
there is some initial progress.

Darin

Re: gradle Issue when building RM docker on MacOSX

2016-05-22 Thread Darin Johnson

I've seen that error if I used a terminal that wasn't loaded with
docker-machine.  I think you can also solve with evaluation
(`docker-machine env`)
On May 22, 2016 8:37 PM, "sarjeet singh"  wrote:

Observed following issue when tried to build RM docker image from mac
(local):

ssingh-mbpro:docker ssingh$ ./gradlew -P dockerTag=sarjeet/myriad
buildRMDocker

   [***output formatted***]

Building image using context
'/Users/ssingh/Myriad/myriad-0.2.0/myriad-0.2.0-incubating-rc2/docker'.

Using tag 'sarjeet/myriad' for image.

java.lang.UnsatisfiedLinkError: Could not find library in classpath, tried:
[libjunixsocket-macosx-1.8-x86_64.dylib,
libjunixsocket-macosx-1.5-x86_64.dylib]

at org.newsclub.net.unix.NativeUnixSocket.load(NativeUnixSocket.java:81)

at
org.newsclub.net.unix.NativeUnixSocket.(NativeUnixSocket.java:112)

at org.newsclub.net.unix.AFUNIXSocket.(AFUNIXSocket.java:36)

at org.newsclub.net.unix.AFUNIXSocket.newInstance(AFUNIXSocket.java:50)

at
com.github.dockerjava.jaxrs.ApacheUnixSocket.(ApacheUnixSocket.java:53)

at
com.github.dockerjava.jaxrs.UnixConnectionSocketFactory.createSocket(UnixConnectionSocketFactory.java:65)

at
org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:108)

at
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:314)

at
org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:357)

at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:218)

at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:194)

at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:85)

at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)

at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)

at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)

at
com.github.dockerjava.jaxrs.connector.ApacheConnector.apply(ApacheConnector.java:443)

at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:246)

at
org.glassfish.jersey.client.JerseyInvocation$2.call(JerseyInvocation.java:683)

at org.glassfish.jersey.internal.Errors.process(Errors.java:315)

at org.glassfish.jersey.internal.Errors.process(Errors.java:297)

at org.glassfish.jersey.internal.Errors.process(Errors.java:228)

at
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:424)

at
org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:679)

at
org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:435)

at
org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:338)

at
com.github.dockerjava.jaxrs.async.POSTCallbackNotifier.response(POSTCallbackNotifier.java:29)

at
com.github.dockerjava.jaxrs.async.AbstractCallbackNotifier.call(AbstractCallbackNotifier.java:45)

at
com.github.dockerjava.jaxrs.async.AbstractCallbackNotifier.call(AbstractCallbackNotifier.java:22)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

[pool-1-thread-1] ERROR
com.github.dockerjava.core.async.ResultCallbackTemplate - Error during
callback

java.lang.NoClassDefFoundError: Could not initialize class
org.newsclub.net.unix.NativeUnixSocket

at org.newsclub.net.unix.AFUNIXSocketImpl.connect(AFUNIXSocketImpl.java:134)

at org.newsclub.net.unix.AFUNIXSocket.connect(AFUNIXSocket.java:97)

at
com.github.dockerjava.jaxrs.ApacheUnixSocket.connect(ApacheUnixSocket.java:64)

at
com.github.dockerjava.jaxrs.UnixConnectionSocketFactory.connectSocket(UnixConnectionSocketFactory.java:73)

at
org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:118)

at
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:314)

at
org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:357)

at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:218)

at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:194)

at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:85)

at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)

at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)

at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)

at
com.github.dockerjava.jaxrs.connector.ApacheConnector.apply(ApacheConnector.java:443)

at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:246)

at
org.glassfish.jersey.client.JerseyInvocation$2.call(JerseyInvocation.java:683)

at

Re: Need help with cgroup troubleshooting or setup issue with NM launch.

2016-05-21 Thread Darin Johnson

Sarjeet:

Can you try adding this to your yarn-site.xml:



yarn.nodemanager.linux-container-executor.cgroups.hierarchy

${yarn.nodemanager.linux-container-executor.cgroups.hierachy}



this should change the hierarchy to
/sys/fs/cgroup/cpu/mesos/XXX-TASK-ID-XXX, which will be rightable.

and explains the error:


Caused by: java.io.IOException: Not able to enforce cpu weights; cannot
write to cgroup at: /sys/fs/cgroup/cpu

The node manager will now add tasks to:

/sys/fs/cgroup/cpu/mesos/XXX-TASK-ID-XXX

I'll go check that to ensure that's in the documentation.

Thanks,

Darin



On Sat, May 21, 2016 at 4:56 AM, Sarjeet Singh 
wrote:

> When trying cgroups on myriad-0.2 RC on a single node mapr cluster, I am
> getting the following issue:
>
> 1. The below errors is when launching NodeManager with cgroups enabled:
>
> *stdout*:
>
> export TASK_DIR=afe954c5-79dc-4238-af84-14855090df34&& sudo chown mapr
> /sys/fs/cgroup/cpu/mesos/afe954c5-79dc-4238-af84-14855090df34 && export
> YARN_HOME=/opt/mapr/hadoop/hadoop-2.7.0; env
> YARN_NODEMANAGER_OPTS=-Dcluster.name.prefix=/cluster1
> -Dnodemanager.resource.io-spindles=4.0
> -Dyarn.nodemanager.linux-container-executor.cgroups.hierarchy=mesos/
> afe954c5-79dc-4238-af84-14855090df34
> -Dyarn.home=/opt/mapr/hadoop/hadoop-2.7.0
> -Dnodemanager.resource.cpu-vcores=4 -Dnodemanager.resource.memory-mb=4096
> -Dmyriad.yarn.nodemanager.address=0.0.0.0:31847
> -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31132
> -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31181
> -Dmyriad.mapreduce.shuffle.port=31166
> YARN_HOME=/opt/mapr/hadoop/hadoop-2.7.0
> /opt/mapr/hadoop/hadoop-2.7.0/bin/yarn nodemanager
>
>
> *stderr*:
>
> 16/05/21 01:43:13 INFO service.AbstractService: Service NodeManager failed
> in state INITED; cause:
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
>
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:214)
>
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
>
> Caused by: java.io.IOException: Not able to enforce cpu weights; cannot
> write to cgroup at: /sys/fs/cgroup/cpu
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:493)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:152)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:135)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
>
> ... 3 more
>
> 16/05/21 01:43:13 WARN service.AbstractService: When stopping the service
> NodeManager : java.lang.NullPointerException
>
> java.lang.NullPointerException
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:164)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:276)
>
> at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>
> at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>
> at
>
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
>
> 16/05/21 01:43:13 FATAL nodemanager.NodeManager: Error starting NodeManager
>
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:214)
>
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
>
> Caused by: java.io.IOException: Not able to enforce cpu weights; cannot
> write to cgroup at: /sys/fs/cgroup/cpu
>
> at
>
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:493)
>
> at
>
>

Myriad PR's

2016-05-09 Thread Darin Johnson

In preparation for the 0.2.0 release I'm going to start merging PR's I've
looked over Mohit's, If anyone want's to look over any of the PR's that'd
be great, otherwise I'm going to assume anything that's been up for over 3
business days with no negative comments is considered to be OK unless
someone has objections.

Darin

[Vote] Release apache-myriad-0.2.0-incubating (release candidate 2)

2016-05-17 Thread Darin Johnson

Hi All,

I have created a source tar ball for Apache Myriad 0.2.0-incubating,
release candidate 2 based off the feed back received from release candidate
1.  Specifically, the NOTICE file has been updated to 2016 and the
framework properly shuts down when using the web ui.

Here’s the release notes:
https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes

The commit to be voted upon is tagged with "myriad-0.2.0-incubating-rc2"
and is available here:
https://git-wip-us.apache.org/repos/asf?p=incubator-myriad.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc2

The artifacts to be voted upon are located below. Please note that this is
a source release:
https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2.0-incubating-rc2/

Release artifacts are signed with the following key:
*https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
*

**Please note that the release tar ball does not include the gradlew script
to build. You need to install gradle in order to build.**

Please try out the release candidate and vote. The vote is open for a
minimum of 3 business days (Friday May 20) or until the necessary number of
votes (3 binding +1s)
is reached.

If/when this vote succeeds, I will call for a vote with IPMC seeking
permission to release RC1 as Apache Myriad 0.2.0 (incubating).

[ ] +1 Release this package as Apache Myriad 0.2.0-incubating
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...

Thanks,
Darin

[Vote] Release apache-myriad-0.2.0-incubating (release candidate 1)

2016-05-13 Thread Darin Johnson

Hi All,

Firstly, thanks everyone for the valuable contributions to the project and
for holding on tight as we move along the release process.

I have created a source tar ball for Apache Myriad 0.2.0-incubating,
release candidate 1.

Here’s the release notes:
https://cwiki.apache.org/confluence/display/MYRIAD/Release+Notes

The commit to be voted upon is tagged with "myriad-0.2.0-incubating-rc1"
and is available here:
https://git-wip-us.apache.org/repos/asf?p=incubator-myriad
.git;a=shortlog;h=refs/tags/myriad-0.2.0-incubating-rc

1

The artifacts to be voted upon are located below. Please note that this is
a source release:
https://dist.apache.org/repos/dist/dev/incubator/myriad/myriad-0.2
.0-incubating-rc1/

Release artifacts are signed with the following key:
*https://home.apache.org/~darinj/gpg/2AAE9E3F.asc
*

**Please note that the release tar ball does not include the gradlew script
to build. You need to install gradle in order to build.**

Please try out the release candidate and vote. The vote is open for a
minimum of 3 business days (Wednesday May 18) or until the necessary number
of votes (3 binding +1s)
is reached.

If/when this vote succeeds, I will call for a vote with IPMC seeking
permission to release RC1 as Apache Myriad 0.2.0 (incubating).

[ ] +1 Release this package as Apache Myriad 0.2.0-incubating
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...

Thanks,
Darin

Myriad Release update

2016-05-11 Thread Darin Johnson

OK, I got the updates merged.  I'm going to be updating documentation and
testing tomorrow.  If all goes well we should have a release tomorrow
evening.

Darin

Re: cgroups suggestions

2016-05-04 Thread Darin Johnson

Santosh, that is the behavior I'm seeing.
On May 4, 2016 6:13 PM, "Santosh Marella" <smare...@maprtech.com> wrote:

> > The second involves the cgroup hierarchy and the cgroup mount point.
> Here
> > the code attempts to create a hierarchy in $CGROUP_DIR/mesos/$TASK_ID.
> > This is problematic as mesos will not unmount the hierarchy when the task
> > finished (in this case the node manager)
>
> IIRC, when a task is launched by mesos, the agent creates
> $CGROUP_DIR/mesos/$TASK_ID mount point to enforce cpu/mem for that task.
> Once the task finishes, the agent should unmount the $TASK_ID. Are you
> saying
> that's not happening for NMs ?
>
> Santosh
>
> On Wed, May 4, 2016 at 10:30 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > I've been digging into groups support, there's a few things that are easy
> > fixes but a few things become problematic so I'd like to discuss.
> >
> > First the code makes certain options dictated that can be placed in the
> > yarn-site.xml - this should be done to remove code and provide
> > flexibility.  That's easy.
> >
> > The second involves the cgroup hierarchy and the cgroup mount point.
> Here
> > the code attempts to create a hierarchy in $CGROUP_DIR/mesos/$TASK_ID.
> > This is problematic as mesos will not unmount the hierarchy when the task
> > finished (in this case the node manager), it is also therefore unable to
> > unmount it's own task hierarchy (This also creates the need to chmod a
> > number of directories as a superuser).  This leads to issues.  An
> > alternative approach would be to use the container-executor program
> > (already suid w/ yarn's group) to create the hierarchy as
> > $CGROUP_DIR/frameworkname if it doesn't exist, this may open another can
> of
> > worms as I haven't tested fully.
> >
> > Any thoughts or suggestions would be appreciated.
> >
> > Darin
> >
>

cgroups suggestions

2016-05-04 Thread Darin Johnson

I've been digging into groups support, there's a few things that are easy
fixes but a few things become problematic so I'd like to discuss.

First the code makes certain options dictated that can be placed in the
yarn-site.xml - this should be done to remove code and provide
flexibility.  That's easy.

The second involves the cgroup hierarchy and the cgroup mount point.  Here
the code attempts to create a hierarchy in $CGROUP_DIR/mesos/$TASK_ID.
This is problematic as mesos will not unmount the hierarchy when the task
finished (in this case the node manager), it is also therefore unable to
unmount it's own task hierarchy (This also creates the need to chmod a
number of directories as a superuser).  This leads to issues.  An
alternative approach would be to use the container-executor program
(already suid w/ yarn's group) to create the hierarchy as
$CGROUP_DIR/frameworkname if it doesn't exist, this may open another can of
worms as I haven't tested fully.

Any thoughts or suggestions would be appreciated.

Darin

Re: cgroups suggestions

2016-05-05 Thread Darin Johnson

It turns out everything works if you set permissions appropriately of
$CGROUP_ROOT/mesos/$TASKID/ so the yarn user can write to the hierarchy.
Then all works exactly as expected.

I spent a while running through the container-executor code and when it
mounts a cgroup subsystem it changes the ownership of the hierarchy to the
yarn user, the original cgroups code of myriad attempted to do something
similar by chmoding the directory but assumed the yarn user work be a
member of group root, also when the code was written the chmod happened as
root, currently that is ineffective as the standard framework user does not
necessarily have permission to modify $CGROUP_ROOT/mesos/$TASKID.  However,
we have a mechanism for using a frameworksuperuser which can do this (my
current hack).

The current code also sets
yarn.nodemanager.linux-container-executor.cgroups.mount-path=/sys/fs/cgroup
and yarn.nodemanager.linux-container-executor.cgroups.mount=true, the
documentation the requires edits to yarn-site.xml to get these passed
through.

Now that I've got things working, I'll start cleaning up the original code
to provide an cleaner setup and adjust the documentation as necessary, I
should have a PR soon.

Re: NM does not start with cgroups enabled

2016-05-05 Thread Darin Johnson

Bjorn, I don't know if you're still experimenting with Myriad, but I
believe I've got a fix for your issue.  I'm going to try to get it in our
next release, so if you have any feedback it would be great.  I verified it
on a couple small systems.

https://github.com/apache/incubator-myriad/pull/69

On Wed, Mar 23, 2016 at 8:17 AM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:

> Hey, Bjorn sorry for the delay, looking at the difference between the
> exceptions and my own experience I believe you left some cgroup configs in
> yarn-site.xml of the node manager.
> On Mar 18, 2016 2:58 AM, "Björn Hagemeier" <b.hageme...@fz-juelich.de>
> wrote:
>
>> Hi Darin,
>>
>> thanks a lot for this. But what about the other case below, when cgroups
>> is disabled?
>>
>>
>> Björn
>>
>> Am 18.03.2016 um 00:25 schrieb Darin Johnson:
>> > Hey Bjorn,
>> >
>> > I think I figured out the issue.  Some of the values for cgroups are
>> still
>> > hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an
>> update
>> > for 0.2.0.  I'll also respond to this thread after a pull request is
>> > submitted in case you'd like to test it.
>> >
>> > Darin
>> > Hi all,
>> >
>> > I have trouble starting the NM on the slave nodes. Apparently, it does
>> > not find it's configuration or sth. is wrong with the configuration.
>> >
>> > With cgroups enabled, the NM does not start, the logs contain,
>> > indicating that there is sth. wrong in the configuratin. However,
>> > yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
>> > value used to be "${yarn.nodemanager.linux-container-executor.group}" as
>> > indicated by the installation documentation, however I'm uncertain
>> > whether this recursion is the correct approach.
>> >
>> >
>> > ==
>> > 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
>> NodeManager
>> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
>> > initialize container executor
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
>> > at
>> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
>> > Caused by: java.io.IOException: Linux container executor not configured
>> > properly (error=24)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
>> > ... 3 more
>> > Caused by: ExitCodeException exitCode=24: Can't get configured value for
>> > yarn.nodemanager.linux-container-executor.group.
>> >
>> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> > at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> > at
>> >
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> > at
>> >
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
>> > ... 4 more
>> > ==
>> >
>> >
>> > I have given it another try with cgroups disabled (in
>> > myriad-config-default.yml), I seem to get a little further, but still
>> > stuck at running Yarn jobs:
>> >
>> > ==
>> > 16/03/14 10:56:34 INFO container.Container: Container
>> > container_1457949199710_0001_01_01 transitioned from LOCALIZED to
>> > RUNNING
>> > 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
>> > launchContainer: [bash,
>> >
>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_01/default_container_executor.sh]
>> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
>> > from container cont

Re: Hello Guys

2016-04-21 Thread Darin Johnson

Hey Sam,

I'm already myriad alongside a CM managed hadoop cluster. It's a little
hacky right now, I'm working on stream lining this, it may involve CM
and/or some docker integration. Here are my current steps:

0. I strongly recommend pulling of master and building from source - it has
some really useful patches. We're working on another release now (couple
weeks out though).

1. Let Cloudera Manager configure hdfs - it does a good job of this.

2. Grab a cloudera tar ball from here:
http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_eom.html#topic_3
(I've also just used apache-hadoop tarballs).

3. Extract the tar ball, copy the native libraries cdh install on your
system in hadoop-*/lib/native

4 cp myriad/myriad-*/build/libs/* hadoop-*/share/hadoop/yarn/lib

5 copy your hadoop configs to hadoop-*/etc/hadoop/

6 create a myriad-default-config.yml in hadoop-*/etc/hadoop follow
instructions on wiki for remote distribution for edits to
myriad-default-config.yml. NB: don't enable yarn cgroups yet - I'm fixing
a bug.

7 chown -R root:root hadoop-*y

8 chown root:yarn hadoop-*/bin/container-executor ; chmod g+s
hadoop*/bin/container-executor

9 mv hadoop-<..> hadoop-myriad

10 tar -zxvf hadoop-myriad.tgz hadoop-myriad ; hadoop fs -put
hadoop-myriad.tgz /dist/

11 cd hadoop-myriad/ && sudo -u yarn bin/yarn resource manager

12 hit the web-ui at host:8192 and flexup some node managers

This has been pretty stable.

Alternatively, if you're running mesos and docker you could look at PR
https://github.com/apache/incubator-myriad/pull/64, it's still WIP but
avoids a lot of setup - would be happy to work with you as I document and
harden this feature for the 0.2.0 release. Currently that runs off generic
hadoop put we could certainly create distribution specific dockerfiles.

Let me know if you need help. Keep in mind this is still an alpha project,
so expect some issues. Would love to get feedback, use cases and feature
ideas.

Also a mesos tip: if your running services like hdfs outside of mesos you
should adjust you're mesos resources appropriately or you'll end up over
subscribed and processes will slow or die.

Thanks for trying myriad!

Darin

On Apr 21, 2016 4:22 AM, "rchen" wrote:

Hi Guys,
Currrently, we are working on Myriad with Cloudera distribution. And we
have proved that Mesos and work with Yarn.
However talking about CM, how to integrate ? Any comments is welcome.
appreciated.

Regards
Sam

Re: Myriad Releases

2016-04-20 Thread Darin Johnson

Hey Zachary! Thanks, for the upvote.

Also if you're looking for projects ping me! I'm going to be adding more
tickets in the next few days.

On Wed, Apr 20, 2016 at 8:38 PM, Zachary Jaffee <z...@case.edu> wrote:

> If I recall the original reason for the 0.1.1 release was that it would be
> able to get it out earlier than the 0.2.0 release. Since it looks like they
> will be released at the same time essentially, the reason to release 0.1.1
> over just waiting to release 0.2.0 goes away.
>
> On Wed, Apr 20, 2016 at 5:29 PM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > During the dev sync today we discussed the upcoming 0.1.1 and 0.2.0
> > releases.  Currently, the only out standing issue for 0.1.1 is MYRIAD-192
> > (Cgroups), for Myriad 0.2.0 the outstanding issues are Myriad-36 and
> Myriad
> > 192 (Configuration and Docker/Appc support).  Currently I have a WIP PR
> for
> > Docker Support which I'd like some feedback on (it's should be super easy
> > to test), I'll probably complete Myriad 192 and part of that PR as it's a
> > natural fit.  I estimate I can get all patches done by early may and
> > hopefully get a release or release candidate out by May 11 (ApacheCon).
> >
> > Due to the Alpha nature of Myriad and the significant value of Docker and
> > Configuration support, I think most people would opt for 0.2.0 over 0.1.1
> > and don't feel it's worth the effort to provide both releases at this
> > time.  I suggest simply doing a 0.2.0 release.  Are there any objections?
> >
> > Darin
> >
>
>
>
> --
> Zach Jaffee
> B.S. Computer Science
> Case Western Reserve University Class of 2017
> Operations Director | WRUW FM 91.1 Cleveland
> (917) 881-0646
> zjaffee.com
> linkedin.com/in/zjaffee
> github.com/ZJaffee
>

Myriad Releases

2016-04-20 Thread Darin Johnson

During the dev sync today we discussed the upcoming 0.1.1 and 0.2.0
releases.  Currently, the only out standing issue for 0.1.1 is MYRIAD-192
(Cgroups), for Myriad 0.2.0 the outstanding issues are Myriad-36 and Myriad
192 (Configuration and Docker/Appc support).  Currently I have a WIP PR for
Docker Support which I'd like some feedback on (it's should be super easy
to test), I'll probably complete Myriad 192 and part of that PR as it's a
natural fit.  I estimate I can get all patches done by early may and
hopefully get a release or release candidate out by May 11 (ApacheCon).

Due to the Alpha nature of Myriad and the significant value of Docker and
Configuration support, I think most people would opt for 0.2.0 over 0.1.1
and don't feel it's worth the effort to provide both releases at this
time.  I suggest simply doing a 0.2.0 release.  Are there any objections?

Darin

Re: Observations on Fine Grained Scaling

2016-04-13 Thread Darin Johnson

Santosh, I get a lot of 2-3 containers.  But I can only get 9-12 (Topped of
the cpu resources at 12 cores) containers if a task runs for more than 30
secs (preferably 60 secs), that's generally not an issue but I thought it
was worth putting on the list for general knowledge.  It's also less of a
deal when more jobs are running.

I have a few ideas on how to improve it and data locality, but think that
it'll likely involve refactoring YarnNodeCapacityManager and
OfferLifeCycleManager to interfaces that can be extended to handle
different strategies which can be configured at on startup.  I'd love to
start that discussion once we finish getting the basic mechanics working.
Maybe a 0.3.0 or 0.4.0 release?

Darin

On Wed, Apr 13, 2016 at 2:14 PM, Santosh Marella <smare...@maprtech.com>
wrote:

> > After the patches it seems stable, I'm able to run multiple terasort/pi
> > jobs and a few scalding jobs without difficulty.
> Great work, Darin. Glad to see FGS is now stable.
>
> >Noticed with jobs with short map tasks (8-12 secs), I rarely got more
> > than two containers per node, I'm curious if I'm not consuming resources
> > fast enough.
> Yes. Perhaps we need to tune the rate at which Mesos sends out resource
> offers
> to frameworks. The default that we observe in Myriad is 5 seconds. However,
> if your
> job has many map tasks and Mesos offer is big enough to accommodate several
> of them,
> then you should ideally see lot more than 2-3 containers per node.
>
> Isn't that happening? How many map tasks does your job have?
>
> Thanks,
> Santosh
>
> On Wed, Apr 13, 2016 at 8:34 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > I've been running a number of tests on the Fine Grained scaling aspect on
> > Myriad.  Here's a few notes:
> >
> > 1. After the patches it seems stable, I'm able to run multiple
> terasort/pi
> > jobs and a few scalding jobs without difficulty.
> > 2. Noticed with jobs with short map tasks (8-12 secs), I rarely got more
> > than two containers per node, I'm curious if I'm not consuming resources
> > fast enough.  The issue goes away on the reduce side (able to get far
> > better utilization of offers).  The issue can be lessened by increasing
> > mapred.splits.min.size and mapred.splits.max.size.  This may be an issue
> > for things like Hive.
> >
> > Darin
> >
>

Re: 答复: Hello Guys

2016-04-27 Thread Darin Johnson

Yongyu,

Sounds like you've made some progress! I'm not sure you'll be able to get
the cluster completely under CM management and the node managers launched
by Myriad aren't running the Cloudera agent (I don't think we'll ever
officially support that, but you could tweak the source to run it in the
background, if you could make it a general process it might be a feature
we'd add).  Though if you're launching the Resource Manager via CM I'd
expect it to show up.  I haven't attempted to put my resource manager under
the CM though.

Darin

On Tue, Apr 26, 2016 at 4:45 AM, 陈泳宇 <yc...@linkernetworks.com> wrote:

> Hi Darin, This is Yongyu from Linkernetworks, I have proved that myriad
> can work well with CDH. Currently, I am also trying to myriad our hadoop
> cluster under the management of CM.
>
> I noticed that you have already complete the mission, but I still got some
> problem here: The status shown on CM dashboard will not change…
>
> Here are my steps:
>
> 1.   Add YARN service on CM dashboard..
>
> 2.   Stop the YARN service
>
> 3.   Cd to /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45
>
> 4.   Copy native libs 、myriad jars as well we the config files.
>
> 5.   Tar this package and upload it to hdfs.
>
> 6.   Start the resourcemanager.
>
> Myriad service will be launched successfully, but nothing changes on the
> dashboard, which means the cluster is not under the management of CM..
>
>
>
>
>
> Regards,
>
> Yongyu
>
>
> Begin forwarded message:
>
> *From:* Darin Johnson <dbjohnson1...@gmail.com>
> *Date:* April 21, 2016 at 8:50:22 PM GMT+8
> *To:* Dev <dev@myriad.incubator.apache.org>
> *Subject:* *Re: Hello Guys*
> *Reply-To:* dev@myriad.incubator.apache.org
>
> Hey Sam,
>
> I'm already myriad alongside a CM managed hadoop cluster.  It's a little
> hacky right now, I'm working on stream lining this, it may involve CM
> and/or some docker integration.  Here are my current steps:
>
> 0. I strongly recommend pulling of master and building from source - it has
> some really useful patches.  We're working on another release now (couple
> weeks out though).
>
> 1. Let Cloudera Manager configure hdfs - it does a good job of this.
>
> 2. Grab a cloudera tar ball from here:
>
> http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_eom.html#topic_3
> (I've also just used apache-hadoop tarballs).
>
> 3. Extract the tar ball, copy the native libraries cdh install on your
> system in hadoop-*/lib/native
>
> 4 cp myriad/myriad-*/build/libs/* hadoop-*/share/hadoop/yarn/lib
>
> 5 copy your hadoop configs to hadoop-*/etc/hadoop/
>
> 6 create a myriad-default-config.yml in hadoop-*/etc/hadoop follow
> instructions on wiki for remote distribution for edits to
> myriad-default-config.yml.  NB: don't enable yarn cgroups yet - I'm fixing
> a bug.
>
> 7 chown -R root:root hadoop-*y
>
> 8 chown root:yarn hadoop-*/bin/container-executor ; chmod g+s
> hadoop*/bin/container-executor
>
> 9 mv hadoop-<..> hadoop-myriad
>
> 10 tar -zxvf hadoop-myriad.tgz hadoop-myriad ; hadoop fs -put
> hadoop-myriad.tgz /dist/
>
> 11 cd hadoop-myriad/ && sudo -u yarn bin/yarn resource manager
>
> 12 hit the web-ui at host:8192 and flexup some node managers
>
> This has been pretty stable.
>
> Alternatively, if you're running mesos and docker you could look at PR
> https://github.com/apache/incubator-myriad/pull/64, it's still WIP but
> avoids a lot of setup - would be happy to work with you as I document and
> harden this feature for the 0.2.0 release.  Currently that runs off generic
> hadoop put we could certainly create distribution specific dockerfiles.
>
> Let me know if you need help.  Keep in mind this is still an alpha project,
> so expect some issues. Would love to get feedback, use cases and feature
> ideas.
>
> Also a mesos tip: if your running services like hdfs outside of mesos you
> should adjust you're mesos resources appropriately or you'll end up over
> subscribed and processes will slow or die.
>
> Thanks for trying myriad!
>
> Darin
>
>
> On Apr 21, 2016 4:22 AM, "rchen" <rc...@linkernetworks.com> wrote:
>
> Hi Guys,
> Currrently, we are working on Myriad with Cloudera distribution. And we
> have proved that Mesos and work with Yarn.
> However talking about CM, how to integrate ? Any comments is welcome.
> appreciated.
>
>
> Regards
> Sam
>
>

Sync tomorrow?

2016-07-26 Thread Darin Johnson

Is there going to be a sync tomorrow?

Darin

Re: vagrant install doesn't show new framework registering

2016-07-26 Thread Darin Johnson

Hey David,

Thanks for the info.  I haven't used the vagrant install for a while, so it
may be good for me to start a fresh instance to check it.  In the meantime
though any notes you have would be great!  We'd be happy to update the
documentation.

On Jul 26, 2016 6:24 PM, "Reno, David"  wrote:

As a follow-up, registration failure seems to have been based on using the
wrong mesosMaster IP address or format. I changed it to "zk://
10.0.2.15:2181/mesos” following syntax from this list archive:
https://mail-archives.apache.org/mod_mbox/myriad-dev/201602.mbox/%3c1519159574.1366234.1456242921704.javamail.ya...@mail.yahoo.com%3e

The mesos master now shows MyriadAlpha as an active framework. Still, the
myriad tasks list shows the default medium as a pending task, so there
still seems to be a problem. The Mesos slave does not show any frameworks
or completed frameworks.

Again, just trying to test-drive the vanilla vagrant install. Happy to
provide notes of what works and doesn’t if anyone wants to update the
vagrant install docs to Myriad 0.2.0:
https://cwiki.apache.org/confluence/display/MYRIAD/Installing+using+Vagrant

Further detail on users:
step 1 seems best to complete as the vagrant user
remaining steps seem to need to be completed as the hadoop user (i.e.
hduser)

Regards,
David

> On Jul 26, 2016, at 9:27 AM, Reno, David  wrote:
>
> Hi Myriaders,
>
> Sorry if I’m reaching out to the wrong alias or help, this is all I see.
I’m getting stuck with the myriad install with vagrant. The wiki seem to
assume 0.1.0 though I’ve cloned the latest 0.2.0 release from github.
>
> I’m following these instructions:
https://cwiki.apache.org/confluence/display/MYRIAD/Installing+using+Vagrant
>
> Step 1 seems to go fine and I can open the HDFS name node and mesos
master http ports and see the pages showing active/started. Step 2 starts
go to a little sideways as it references “myriad-executor-0.1.0.jar” which
seems to be replaced by “myriad-executor-0.2.0.jar” which I use instead.
Step 3 asks for minimum configuration changes which seem to already be
completed. However, I change the line:
>   path:
file://localhost/usr/local/libexec/mesos/myriad-executor-runnable-0.1.0.jar
> to:
>   path:
file:///usr/local/hadoop/share/hadoop/yarn/lib/myriad-executor-0.2.0.jar
>
> For step 4, I add all properties listed to the yarn-site.xml file. I then
launch the resource manager using the “yarn-daemon.sh start
resourcemanager” command.
>
> At this point, I can load the http://10.141.141.20:8192 port and see the
myriad about and API page but the http://10.141.141.20:5050/#/frameworks
page does not show myriad or hadoop as an active framework. I use the
myriad flex tab to “flex up” a small server, it appears as a pending task,
but stays pending and mesos frameworks don’t change.
>
> Interesting lines from
/usr/local/hadoop/logs/yarn-hduser-resourcemanager-vagrant-ubuntu-trusty-64.out
include the following:
> I0726 13:01:41.358747 15817 sched.cpp:164] Version: 0.24.1
> I0726 13:01:41.361140 15847 sched.cpp:262] New master detected at
master@10.0.2.15:5050
> I0726 13:01:41.361538 15847 sched.cpp:272] No credentials provided.
Attempting to register without authentication
> E0726 13:01:41.362741 15852 socket.hpp:174] Shutdown failed on fd=231:
Transport endpoint is not connected [107]
> E0726 13:01:41.363302 15852 socket.hpp:174] Shutdown failed on fd=231:
Transport endpoint is not connected [107]
> E0726 13:01:41.396867 15852 socket.hpp:174] Shutdown failed on fd=231:
Transport endpoint is not connected [107]
> Jul 26, 2016 1:01:41 PM com.google.inject.servlet.GuiceFilter setPipeline
> WARNING: Multiple Servlet injectors detected. This is a warning
indicating that you have more than one GuiceFilter running in your web
application. If this is deliberate, you may safely ignore this message. If
this is NOT deliberate however, your application may not work as expected.
> E0726 13:01:44.780588 15852 socket.hpp:174] Shutdown failed on fd=275:
Transport endpoint is not connected [107]
> E0726 13:01:51.604310 15852 socket.hpp:174] Shutdown failed on fd=275:
Transport endpoint is not connected [107]
> E0726 13:02:01.226771 15852 socket.hpp:174] Shutdown failed on fd=275:
Transport endpoint is not connected [107]
> E0726 13:02:11.525804 15852 socket.hpp:174] Shutdown failed on fd=277:
Transport endpoint is not connected [107]
> Jul 26, 2016 1:02:15 PM
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8
resolve
> SEVERE: null
> java.lang.IllegalAccessException: Class
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8
can not access a member of class javax.ws.rs.core.Response with modifiers
"protected"
>
> Any help or suggestions are much appreciated,
> David Reno
> Systems Architect, Comcast

Sync today?

2016-07-13 Thread Darin Johnson

Couldn't connect

Re: Sync tomorrow?

2016-07-27 Thread Darin Johnson

Ended up in another meeting taking longer than I thought. Sorry.

On Tue, Jul 26, 2016 at 2:27 PM, Adam Bordelon <a...@mesosphere.io> wrote:

> Yes, I'll be there. Sorry for not making it last time.
>
> On Tue, Jul 26, 2016 at 11:20 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > Is there going to be a sync tomorrow?
> >
> > Darin
> >
>

Sync tomorrow?

2016-08-09 Thread Darin Johnson

Trying to plan my day tomorrow.

Re: NPE in removing container

2016-07-12 Thread Darin Johnson

Hey Stephen,

I was on vacation last week, I'm looking over the logs this week.  I've got
a few ideas for a first but may take me a while as I get back into work.

Darin

On Fri, Jul 1, 2016 at 2:43 AM, Stephen Gran <stephen.g...@piksel.com>
wrote:

> Hi,
>
> It's not a problem at all.  Anything I can do to help.
>
> I've attached the log file for the relevant time period.  This is hadoop
> 2.7.2 - you have a good memory :)
>
> Cheers,
>
> On 30/06/16 22:56, Darin Johnson wrote:
> > Hey Steven,
> >
> > Looks like this might be slightly different than what I was originally
> > expecting.  Sorry to keep asking for more info but it will help me
> recreate
> > the issue.  Could you possibly get me more of the ResourceManager logs?
> In
> > particular, I'm trying to figure out where upgradeNodeCapacity is getting
> > called from and any transitions of slave2.  Also, what version of hadoop
> > are you running, I think I recall it being 2.72 but should verify.
> >
> > Thanks for taken the time to work with me on this.
> >
> > Darin
> >
> > On Thu, Jun 30, 2016 at 5:10 PM, Stephen Gran <stephen.g...@piksel.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Yes - the imaginatively named slave2 was a zero-sized nm at that point -
> >> I am looking at how small a pool of reserved resource I can get away
> >> with, and use FGS for burst activity.
> >>
> >>
> >> Here are all the logs related to that host:port combination around that
> >> time:
> >>
> >> 2016-06-30 19:47:43,756 INFO
> >> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor:
> >> Expired:slave2:24679 Timed out after 2 secs
> >> 2016-06-30 19:47:43,771 INFO
> >> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
> >> Deactivating Node slave2:24679 as it is now LOST
> >> 2016-06-30 19:47:43,771 INFO
> >> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
> >> slave2:24679 Node Transitioned from RUNNING to LOST
> >> 2016-06-30 19:47:43,909 INFO
> >> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Removed task
> >> yarn_Container: [ContainerId: container_1467314892573_0009_01_05,
> >> NodeId: slave2:24679, NodeHttpAddress: slave2:23177, Resource:
> >> <memory:2048, vCores:1>, Priority: 20, Token: Token { kind:
> >> ContainerToken, service: 10.0.5.5:24679 }, ] with exit status freeing 0
> >> cpu and 1 mem.
> >> 2016-06-30 19:47:43,909 INFO
> >> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> >> Released container container_1467314892573_0009_01_05 of capacity
> >> <memory:2048, vCores:1> on host slave2:24679, which currently has 1
> >> containers, <memory:2048, vCores:1> used and <memory:2048, vCores:1>
> >> available, release resources=true
> >> 2016-06-30 19:47:43,909 INFO
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> >> Application attempt appattempt_1467314892573_0009_01 released
> >> container container_1467314892573_0009_01_05 on node: host:
> >> slave2:24679 #containers=1 available=<memory:2048, vCores:1>
> >> used=<memory:2048, vCores:1> with event: KILL
> >> 2016-06-30 19:47:43,909 INFO
> >> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
> >> Node not found resyncing slave2:24679
> >> 2016-06-30 19:47:43,952 INFO
> >> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Removed task
> >> yarn_Container: [ContainerId: container_1467314892573_0009_01_06,
> >> NodeId: slave2:24679, NodeHttpAddress: slave2:23177, Resource:
> >> <memory:2048, vCores:1>, Priority: 20, Token: Token { kind:
> >> ContainerToken, service: 10.0.5.5:24679 }, ] with exit status freeing 0
> >> cpu and 1 mem.
> >> 2016-06-30 19:47:43,952 INFO
> >> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
> >> Released container container_1467314892573_0009_01_06 of capacity
> >> <memory:2048, vCores:1> on host slave2:24679, which currently has 0
> >> containers, <memory:0, vCores:0> used and <memory:4096, vCores:2>
> >> available, release resources=true
> >> 2016-06-30 19:47:43,952 INFO
> >>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> >> Application attempt appattempt_1467314892573_0009_01 released
> >> container container_1467314892573_0009_01_06 on n

Re: Myraid Slack

2016-06-28 Thread Darin Johnson

We also have a dev sync every other Wednesday via Google Hangouts:
https://plus.google.com/hangouts/_/mesosphere.io/myriad

Darin

On Thu, Jun 23, 2016 at 3:01 AM, Swapnil Daingade <
swapnil.daing...@gmail.com> wrote:

> Hi Sam,
>
> Myriad is a fairly new project. The IPMC vote for Myriad 0.2 just passed
> this week.
> Given we are early in the incubation stage, its not uncommon for one or
> two vendors
> to back the project.
>
> I'll let other community members talk about their experiences deploying
> Myriad
> but Its really great that you are considering deploying Myriad in
> production.
> Your feedback will definitely help shape the road map for Myriad going
> forward.
>
> Regards
> Swapnil
>
>
>
> On 06/22/2016 11:24 PM, Sam Chen wrote:
>
>> Hi Swapnil,
>> MapR is one company to give Myriad support, right?  Any reference ?
>>
>> Regards,
>> Sam
>>
>>
>> Sent from my iPhone
>>
>> On Jun 23, 2016, at 10:39 AM, Swapnil Daingade <
>>> swapnil.daing...@gmail.com> wrote:
>>>
>>> MapR supports Myriad 0.1 currently
>>>
>>> https://www.mapr.com/products/whats-included
>>> https://www.mapr.com/products/product-overview/apache-myriad
>>>
>>> Regards
>>> Swapnil
>>>
>>>
>>> On Wed, Jun 22, 2016 at 6:51 PM, Sam Chen <rc...@linkernetworks.com>
>>>> wrote:
>>>>
>>>> Hi Darin,
>>>> Thanks for you reply. Makes sense to use Slack. Btw, we are going to use
>>>> Myriad in production, any company have capability to support this ? And
>>>> is
>>>> there any reference in production ?
>>>>
>>>> Regards,
>>>> Sam
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jun 23, 2016, at 2:30 AM, Darin Johnson <dbjohnson1...@gmail.com>
>>>>>>
>>>>> wrote:
>>>>>
>>>>> Sam,
>>>>>
>>>>> I don't believe so.  But we do have an IRC channel #myriad on FreeNode.
>>>>>
>>>> I
>>>>
>>>>> know the mesosphere guys set up slackbots to interact with it.  I'm
>>>>> only
>>>>> there occasionally or by appointment. I did notice Kudu now uses slack,
>>>>>
>>>> so
>>>>
>>>>> maybe slack makes more sense than IRC these days, or Gitter Chat.
>>>>>
>>>>> Darin
>>>>>
>>>>> On Wed, Jun 22, 2016 at 1:55 AM, Sam Chen <rc...@linkernetworks.com>
>>>>>>
>>>>> wrote:
>>>>
>>>>> Guys,
>>>>>> Do we have Slack for Myraid?
>>>>>>
>>>>>> Regards ,
>>>>>> Sam
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>
>>>>
>>>>
>>
>>
>

Myriad is 0.2.0!

2016-06-28 Thread Darin Johnson

I've used the release script to publish the source tarball which is
available here:

https://www.apache.org/dist/incubator/myriad/myriad-0.2.0-incubating/

In addition, I've written a short release note and updated the downloads
page, in the following PR:

https://github.com/apache/incubator-myriad/pull/81

I'll leave that open until Friday at 5pm or until I get 3 +1's.  Once
that's done I'll update the svn for the website.

Darin

Re: NPE in removing container

2016-06-30 Thread Darin Johnson

Steven, thanks.  I thought I had fixed that but perhaps a regression was
made in another merge.  I'll look into it, can you answer a few questions?
Was the node (slave2) a zero sided nodemanager (for fgs)?  In the node
manager logs had it recently become unhealthy?  I'm pretty concerned about
this and will try to get a patch soon.

Thanks,

Darin
On Jun 30, 2016 3:53 PM, "Stephen Gran"  wrote:

> Hi,
>
> Just playing with the 0.2.0 release (congratulations, by the way!)
>
> I have seen this twice now, although it is by no means consistent - I
> will have a dozen successful runs, and then one of these.  This exits
> the RM, which makes it rather noticable.
>
> 2016-06-30 19:47:43,952 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Removed node slave2:24679 cluster capacity:  s:4>
> 2016-06-30 19:47:43,953 FATAL
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
> handling event type NODE_RESOURCE_UPDATE to the scheduler
> java.lang.NullPointerException
>  at
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:563)
>  at
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.updateNodeResource(FairScheduler.java:1652)
>  at
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1222)
>  at
>
> org.apache.myriad.scheduler.yarn.MyriadFairScheduler.handle(MyriadFairScheduler.java:102)
>  at
>
> org.apache.myriad.scheduler.yarn.MyriadFairScheduler.handle(MyriadFairScheduler.java:42)
>  at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:671)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-06-30 19:47:43,972 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting,
> bbye..
>
> --
> Stephen Gran
> Senior Technical Architect
>
> picture the possibilities | piksel.com
> This message is private and confidential. If you have received this
> message in error, please notify the sender or serviced...@piksel.com and
> remove it from your system.
>
> Piksel Inc is a company registered in the United States New York City,
> 1250 Broadway, Suite 1902, New York, NY 10001. F No. = 2931986
>

Re: Myraid Slack

2016-06-29 Thread Darin Johnson

Having issues getting on, is anybody else able to connect?
On Jun 28, 2016 10:34 PM, "Adam Bordelon" <a...@mesosphere.io> wrote:

> (Next dev sync is tomorrow, 9am Pacific time)
>
> On Tue, Jun 28, 2016 at 12:32 PM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
>
> > We also have a dev sync every other Wednesday via Google Hangouts:
> > https://plus.google.com/hangouts/_/mesosphere.io/myriad
> >
> > Darin
> >
> > On Thu, Jun 23, 2016 at 3:01 AM, Swapnil Daingade <
> > swapnil.daing...@gmail.com> wrote:
> >
> > > Hi Sam,
> > >
> > > Myriad is a fairly new project. The IPMC vote for Myriad 0.2 just
> passed
> > > this week.
> > > Given we are early in the incubation stage, its not uncommon for one or
> > > two vendors
> > > to back the project.
> > >
> > > I'll let other community members talk about their experiences deploying
> > > Myriad
> > > but Its really great that you are considering deploying Myriad in
> > > production.
> > > Your feedback will definitely help shape the road map for Myriad going
> > > forward.
> > >
> > > Regards
> > > Swapnil
> > >
> > >
> > >
> > > On 06/22/2016 11:24 PM, Sam Chen wrote:
> > >
> > >> Hi Swapnil,
> > >> MapR is one company to give Myriad support, right?  Any reference ?
> > >>
> > >> Regards,
> > >> Sam
> > >>
> > >>
> > >> Sent from my iPhone
> > >>
> > >> On Jun 23, 2016, at 10:39 AM, Swapnil Daingade <
> > >>> swapnil.daing...@gmail.com> wrote:
> > >>>
> > >>> MapR supports Myriad 0.1 currently
> > >>>
> > >>> https://www.mapr.com/products/whats-included
> > >>> https://www.mapr.com/products/product-overview/apache-myriad
> > >>>
> > >>> Regards
> > >>> Swapnil
> > >>>
> > >>>
> > >>> On Wed, Jun 22, 2016 at 6:51 PM, Sam Chen <rc...@linkernetworks.com>
> > >>>> wrote:
> > >>>>
> > >>>> Hi Darin,
> > >>>> Thanks for you reply. Makes sense to use Slack. Btw, we are going to
> > use
> > >>>> Myriad in production, any company have capability to support this ?
> > And
> > >>>> is
> > >>>> there any reference in production ?
> > >>>>
> > >>>> Regards,
> > >>>> Sam
> > >>>>
> > >>>> Sent from my iPhone
> > >>>>
> > >>>> On Jun 23, 2016, at 2:30 AM, Darin Johnson <dbjohnson1...@gmail.com
> >
> > >>>>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>> Sam,
> > >>>>>
> > >>>>> I don't believe so.  But we do have an IRC channel #myriad on
> > FreeNode.
> > >>>>>
> > >>>> I
> > >>>>
> > >>>>> know the mesosphere guys set up slackbots to interact with it.  I'm
> > >>>>> only
> > >>>>> there occasionally or by appointment. I did notice Kudu now uses
> > slack,
> > >>>>>
> > >>>> so
> > >>>>
> > >>>>> maybe slack makes more sense than IRC these days, or Gitter Chat.
> > >>>>>
> > >>>>> Darin
> > >>>>>
> > >>>>> On Wed, Jun 22, 2016 at 1:55 AM, Sam Chen <
> rc...@linkernetworks.com>
> > >>>>>>
> > >>>>> wrote:
> > >>>>
> > >>>>> Guys,
> > >>>>>> Do we have Slack for Myraid?
> > >>>>>>
> > >>>>>> Regards ,
> > >>>>>> Sam
> > >>>>>>
> > >>>>>> Sent from my iPhone
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> > >
> >
>

Re: Myraid Slack

2016-06-29 Thread Darin Johnson

Maybe you can post the link?  Maybe mine is old.
On Jun 29, 2016 12:05 PM, "Ken Sipe" <k...@mesosphere.io> wrote:

> I am on
> > On Jun 29, 2016, at 11:04 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
> >
> > Having issues getting on, is anybody else able to connect?
> > On Jun 28, 2016 10:34 PM, "Adam Bordelon" <a...@mesosphere.io> wrote:
> >
> >> (Next dev sync is tomorrow, 9am Pacific time)
> >>
> >> On Tue, Jun 28, 2016 at 12:32 PM, Darin Johnson <
> dbjohnson1...@gmail.com>
> >> wrote:
> >>
> >>> We also have a dev sync every other Wednesday via Google Hangouts:
> >>> https://plus.google.com/hangouts/_/mesosphere.io/myriad
> >>>
> >>> Darin
> >>>
> >>> On Thu, Jun 23, 2016 at 3:01 AM, Swapnil Daingade <
> >>> swapnil.daing...@gmail.com> wrote:
> >>>
> >>>> Hi Sam,
> >>>>
> >>>> Myriad is a fairly new project. The IPMC vote for Myriad 0.2 just
> >> passed
> >>>> this week.
> >>>> Given we are early in the incubation stage, its not uncommon for one
> or
> >>>> two vendors
> >>>> to back the project.
> >>>>
> >>>> I'll let other community members talk about their experiences
> deploying
> >>>> Myriad
> >>>> but Its really great that you are considering deploying Myriad in
> >>>> production.
> >>>> Your feedback will definitely help shape the road map for Myriad going
> >>>> forward.
> >>>>
> >>>> Regards
> >>>> Swapnil
> >>>>
> >>>>
> >>>>
> >>>> On 06/22/2016 11:24 PM, Sam Chen wrote:
> >>>>
> >>>>> Hi Swapnil,
> >>>>> MapR is one company to give Myriad support, right?  Any reference ?
> >>>>>
> >>>>> Regards,
> >>>>> Sam
> >>>>>
> >>>>>
> >>>>> Sent from my iPhone
> >>>>>
> >>>>> On Jun 23, 2016, at 10:39 AM, Swapnil Daingade <
> >>>>>> swapnil.daing...@gmail.com> wrote:
> >>>>>>
> >>>>>> MapR supports Myriad 0.1 currently
> >>>>>>
> >>>>>> https://www.mapr.com/products/whats-included
> >>>>>> https://www.mapr.com/products/product-overview/apache-myriad
> >>>>>>
> >>>>>> Regards
> >>>>>> Swapnil
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jun 22, 2016 at 6:51 PM, Sam Chen <rc...@linkernetworks.com
> >
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Darin,
> >>>>>>> Thanks for you reply. Makes sense to use Slack. Btw, we are going
> to
> >>> use
> >>>>>>> Myriad in production, any company have capability to support this ?
> >>> And
> >>>>>>> is
> >>>>>>> there any reference in production ?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Sam
> >>>>>>>
> >>>>>>> Sent from my iPhone
> >>>>>>>
> >>>>>>> On Jun 23, 2016, at 2:30 AM, Darin Johnson <
> dbjohnson1...@gmail.com
> >>>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Sam,
> >>>>>>>>
> >>>>>>>> I don't believe so.  But we do have an IRC channel #myriad on
> >>> FreeNode.
> >>>>>>>>
> >>>>>>> I
> >>>>>>>
> >>>>>>>> know the mesosphere guys set up slackbots to interact with it.
> I'm
> >>>>>>>> only
> >>>>>>>> there occasionally or by appointment. I did notice Kudu now uses
> >>> slack,
> >>>>>>>>
> >>>>>>> so
> >>>>>>>
> >>>>>>>> maybe slack makes more sense than IRC these days, or Gitter Chat.
> >>>>>>>>
> >>>>>>>> Darin
> >>>>>>>>
> >>>>>>>> On Wed, Jun 22, 2016 at 1:55 AM, Sam Chen <
> >> rc...@linkernetworks.com>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Guys,
> >>>>>>>>> Do we have Slack for Myraid?
> >>>>>>>>>
> >>>>>>>>> Regards ,
> >>>>>>>>> Sam
> >>>>>>>>>
> >>>>>>>>> Sent from my iPhone
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: Myraid Slack

2016-06-29 Thread Darin Johnson

Still no luck off wifi or cell.
On Jun 29, 2016 12:11 PM, "Ken Sipe" <k...@mesosphere.io> wrote:

> https://plus.google.com/hangouts/_/mesosphere.io/myriad <
> https://plus.google.com/hangouts/_/mesosphere.io/myriad>
>
>
> > On Jun 29, 2016, at 11:08 AM, yuliya Feldman <yufeld...@yahoo.com.INVALID>
> wrote:
> >
> > no luck joining so far
> >
> >  From: Ken Sipe <k...@mesosphere.io>
> > To: dev@myriad.incubator.apache.org
> > Sent: Wednesday, June 29, 2016 9:04 AM
> > Subject: Re: Myraid Slack
> >
> > I am on
> >> On Jun 29, 2016, at 11:04 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
> >>
> >> Having issues getting on, is anybody else able to connect?
> >> On Jun 28, 2016 10:34 PM, "Adam Bordelon" <a...@mesosphere.io> wrote:
> >>
> >>> (Next dev sync is tomorrow, 9am Pacific time)
> >>>
> >>> On Tue, Jun 28, 2016 at 12:32 PM, Darin Johnson <
> dbjohnson1...@gmail.com>
> >>> wrote:
> >>>
> >>>> We also have a dev sync every other Wednesday via Google Hangouts:
> >>>> https://plus.google.com/hangouts/_/mesosphere.io/myriad
> >>>>
> >>>> Darin
> >>>>
> >>>> On Thu, Jun 23, 2016 at 3:01 AM, Swapnil Daingade <
> >>>> swapnil.daing...@gmail.com> wrote:
> >>>>
> >>>>> Hi Sam,
> >>>>>
> >>>>> Myriad is a fairly new project. The IPMC vote for Myriad 0.2 just
> >>> passed
> >>>>> this week.
> >>>>> Given we are early in the incubation stage, its not uncommon for one
> or
> >>>>> two vendors
> >>>>> to back the project.
> >>>>>
> >>>>> I'll let other community members talk about their experiences
> deploying
> >>>>> Myriad
> >>>>> but Its really great that you are considering deploying Myriad in
> >>>>> production.
> >>>>> Your feedback will definitely help shape the road map for Myriad
> going
> >>>>> forward.
> >>>>>
> >>>>> Regards
> >>>>> Swapnil
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 06/22/2016 11:24 PM, Sam Chen wrote:
> >>>>>
> >>>>>> Hi Swapnil,
> >>>>>> MapR is one company to give Myriad support, right?  Any reference ?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Sam
> >>>>>>
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>> On Jun 23, 2016, at 10:39 AM, Swapnil Daingade <
> >>>>>>> swapnil.daing...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> MapR supports Myriad 0.1 currently
> >>>>>>>
> >>>>>>> https://www.mapr.com/products/whats-included
> >>>>>>> https://www.mapr.com/products/product-overview/apache-myriad
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Swapnil
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Jun 22, 2016 at 6:51 PM, Sam Chen <
> rc...@linkernetworks.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi Darin,
> >>>>>>>> Thanks for you reply. Makes sense to use Slack. Btw, we are going
> to
> >>>> use
> >>>>>>>> Myriad in production, any company have capability to support this
> ?
> >>>> And
> >>>>>>>> is
> >>>>>>>> there any reference in production ?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Sam
> >>>>>>>>
> >>>>>>>> Sent from my iPhone
> >>>>>>>>
> >>>>>>>> On Jun 23, 2016, at 2:30 AM, Darin Johnson <
> dbjohnson1...@gmail.com
> >>>>
> >>>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Sam,
> >>>>>>>>>
> >>>>>>>>> I don't believe so.  But we do have an IRC channel #myriad on
> >>>> FreeNode.
> >>>>>>>>>
> >>>>>>>> I
> >>>>>>>>
> >>>>>>>>> know the mesosphere guys set up slackbots to interact with it.
> I'm
> >>>>>>>>> only
> >>>>>>>>> there occasionally or by appointment. I did notice Kudu now uses
> >>>> slack,
> >>>>>>>>>
> >>>>>>>> so
> >>>>>>>>
> >>>>>>>>> maybe slack makes more sense than IRC these days, or Gitter Chat.
> >>>>>>>>>
> >>>>>>>>> Darin
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 22, 2016 at 1:55 AM, Sam Chen <
> >>> rc...@linkernetworks.com>
> >>>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Guys,
> >>>>>>>>>> Do we have Slack for Myraid?
> >>>>>>>>>>
> >>>>>>>>>> Regards ,
> >>>>>>>>>> Sam
> >>>>>>>>>>
> >>>>>>>>>> Sent from my iPhone
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
> >
>
>

Re: Myraid Slack

2016-06-29 Thread Darin Johnson

Still no luck

On Wed, Jun 29, 2016 at 12:17 PM, Ken Sipe <k...@mesosphere.io> wrote:

> Darin try one more time… I think we had a miss configuration
> > On Jun 29, 2016, at 11:15 AM, Darin Johnson <dbjohnson1...@gmail.com>
> wrote:
> >
> > Still no luck off wifi or cell.
> > On Jun 29, 2016 12:11 PM, "Ken Sipe" <k...@mesosphere.io> wrote:
> >
> >> https://plus.google.com/hangouts/_/mesosphere.io/myriad <
> >> https://plus.google.com/hangouts/_/mesosphere.io/myriad>
> >>
> >>
> >>> On Jun 29, 2016, at 11:08 AM, yuliya Feldman
> <yufeld...@yahoo.com.INVALID>
> >> wrote:
> >>>
> >>> no luck joining so far
> >>>
> >>> From: Ken Sipe <k...@mesosphere.io>
> >>> To: dev@myriad.incubator.apache.org
> >>> Sent: Wednesday, June 29, 2016 9:04 AM
> >>> Subject: Re: Myraid Slack
> >>>
> >>> I am on
> >>>> On Jun 29, 2016, at 11:04 AM, Darin Johnson <dbjohnson1...@gmail.com>
> >> wrote:
> >>>>
> >>>> Having issues getting on, is anybody else able to connect?
> >>>> On Jun 28, 2016 10:34 PM, "Adam Bordelon" <a...@mesosphere.io> wrote:
> >>>>
> >>>>> (Next dev sync is tomorrow, 9am Pacific time)
> >>>>>
> >>>>> On Tue, Jun 28, 2016 at 12:32 PM, Darin Johnson <
> >> dbjohnson1...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> We also have a dev sync every other Wednesday via Google Hangouts:
> >>>>>> https://plus.google.com/hangouts/_/mesosphere.io/myriad
> >>>>>>
> >>>>>> Darin
> >>>>>>
> >>>>>> On Thu, Jun 23, 2016 at 3:01 AM, Swapnil Daingade <
> >>>>>> swapnil.daing...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Sam,
> >>>>>>>
> >>>>>>> Myriad is a fairly new project. The IPMC vote for Myriad 0.2 just
> >>>>> passed
> >>>>>>> this week.
> >>>>>>> Given we are early in the incubation stage, its not uncommon for
> one
> >> or
> >>>>>>> two vendors
> >>>>>>> to back the project.
> >>>>>>>
> >>>>>>> I'll let other community members talk about their experiences
> >> deploying
> >>>>>>> Myriad
> >>>>>>> but Its really great that you are considering deploying Myriad in
> >>>>>>> production.
> >>>>>>> Your feedback will definitely help shape the road map for Myriad
> >> going
> >>>>>>> forward.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Swapnil
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 06/22/2016 11:24 PM, Sam Chen wrote:
> >>>>>>>
> >>>>>>>> Hi Swapnil,
> >>>>>>>> MapR is one company to give Myriad support, right?  Any reference
> ?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Sam
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Sent from my iPhone
> >>>>>>>>
> >>>>>>>> On Jun 23, 2016, at 10:39 AM, Swapnil Daingade <
> >>>>>>>>> swapnil.daing...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> MapR supports Myriad 0.1 currently
> >>>>>>>>>
> >>>>>>>>> https://www.mapr.com/products/whats-included
> >>>>>>>>> https://www.mapr.com/products/product-overview/apache-myriad
> >>>>>>>>>
> >>>>>>>>> Regards
> >>>>>>>>> Swapnil
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 22, 2016 at 6:51 PM, Sam Chen <
> >> rc...@linkernetworks.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Darin,
> >>>>>>>>>> Thanks for you reply. Makes sense to use Slack. Btw, we are
> going
> >> to
> >>>>>> use
> >>>>>>>>>> Myriad in production, any company have capability to support
> this
> >> ?
> >>>>>> And
> >>>>>>>>>> is
> >>>>>>>>>> there any reference in production ?
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Sam
> >>>>>>>>>>
> >>>>>>>>>> Sent from my iPhone
> >>>>>>>>>>
> >>>>>>>>>> On Jun 23, 2016, at 2:30 AM, Darin Johnson <
> >> dbjohnson1...@gmail.com
> >>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Sam,
> >>>>>>>>>>>
> >>>>>>>>>>> I don't believe so.  But we do have an IRC channel #myriad on
> >>>>>> FreeNode.
> >>>>>>>>>>>
> >>>>>>>>>> I
> >>>>>>>>>>
> >>>>>>>>>>> know the mesosphere guys set up slackbots to interact with it.
> >> I'm
> >>>>>>>>>>> only
> >>>>>>>>>>> there occasionally or by appointment. I did notice Kudu now
> uses
> >>>>>> slack,
> >>>>>>>>>>>
> >>>>>>>>>> so
> >>>>>>>>>>
> >>>>>>>>>>> maybe slack makes more sense than IRC these days, or Gitter
> Chat.
> >>>>>>>>>>>
> >>>>>>>>>>> Darin
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jun 22, 2016 at 1:55 AM, Sam Chen <
> >>>>> rc...@linkernetworks.com>
> >>>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Guys,
> >>>>>>>>>>>> Do we have Slack for Myraid?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards ,
> >>>>>>>>>>>> Sam
> >>>>>>>>>>>>
> >>>>>>>>>>>> Sent from my iPhone
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>>
> >>
> >>
>
>

Website is updated, 0.2.0 is official!

2016-06-29 Thread Darin Johnson

http://myriad.apache.org/

Tell your friends!

Re: Resource manager error

2016-08-16 Thread Darin Johnson

Hey Mathew, my coworker found the same issue recently, I fixed it on my
last pull request, if you'd like to pull from master.

Alternatively, you could comment out the appendCgroups line in
myriad-scheduler
/src

/main

/java

/org

/apache

/myriad

/scheduler

/*NMExecutorCLGenImpl* and rebuild.

Sorry that missed my QA unfortunately I'm always using cgroups and didn't
test that.  We may do a 0.2.1 release but I can say when.

Darin

On Aug 16, 2016 8:49 AM, "Matthew J. Loppatto" 
wrote:

> Hi,
>
>
>
> I’m setting up Myriad 0.2.0 on my Mesos cluster following this guide:
> https://cwiki.apache.org/confluence/display/MYRIAD/
> Installing+for+Developers
>
>
>
> And I get the following error in the resource manager executor log in
> mesos after starting it with `/opt/hadoop-2.7.2/bin/yarn resourcemanager`:
>
>
>
> chown: cannot access 
> ‘/sys/fs/cgroup/cpu/mesos/f5d6c530-c13d-4b1d-bc30-f298affb6442’:
> No such file or directory
>
> env: /bin/yarn: No such file or directory
>
> ory
>
>
>
> It appears the ‘mesos’ directory doesn’t exist under /sys/fs/cgroup/cpu.
> Any ideas what the issue could be?
>
>
>
> This is my yarn-site.xml:
>
>
>
> 
>
> 
>
>
>
>yarn.nodemanager.aux-services
>
>mapreduce_shuffle,myriad_executor
>
>
>
>
>
>
>
>yarn.nodemanager.aux-services.mapreduce_shuffle.class
>
>org.apache.hadoop.mapred.ShuffleHandler
>
>
>
>
>
>yarn.nodemanager.aux-services.myriad_executor.class
>
>org.apache.myriad.executor.MyriadExecutorAuxService
>
>
>
>
>
>yarn.nm.liveness-monitor.expiry-interval-ms
>
>2000
>
>
>
>
>
>yarn.am.liveness-monitor.expiry-interval-ms
>
>1
>
>
>
>
>
>yarn.resourcemanager.nm.liveness-monitor.interval-ms
>
>1000
>
>
>
> 
>
>
>
>yarn.scheduler.minimum-allocation-vcores
>
>0
>
>
>
>
>
>yarn.scheduler.minimum-allocation-mb
>
>0
>
>
>
> 
>
> 
>
>yarn.nodemanager.resource.cpu-vcores
>
>${nodemanager.resource.cpu-vcores}
>
> 
>
> 
>
>yarn.nodemanager.resource.memory-mb
>
>${nodemanager.resource.memory-mb}
>
> 
>
> 
>
> 
>
>yarn.nodemanager.address
>
>${myriad.yarn.nodemanager.address}
>
> 
>
> 
>
>yarn.nodemanager.webapp.address
>
>${myriad.yarn.nodemanager.webapp.address}
>
> 
>
> 
>
>yarn.nodemanager.webapp.https.address
>
>${myriad.yarn.nodemanager.webapp.address}
>
> 
>
> 
>
>yarn.nodemanager.localizer.address
>
>${myriad.yarn.nodemanager.localizer.address}
>
> 
>
> 
>
> 
>
>yarn.resourcemanager.scheduler.class
>
>org.apache.myriad.scheduler.yarn.MyriadFairScheduler
>
>One can configure other scehdulers as well from following
> list: org.apache.myriad.scheduler.yarn.MyriadCapacityScheduler,
> org.apache.myriad.scheduler.yarn.MyriadFifoScheduler
>
> 
>
> 
>
> 
>
>yarn.nodemanager.pmem-check-enabled
>
>false
>
> 
>
> 
>
>yarn.nodemanager.vmem-check-enabled
>
>false
>
> 
>
> 
>
>
>
>
>
> My myriad-config-default.yml:
>
>
>
> mesosMaster: zk://myip:2181/mesos
>
> checkpoint: false
>
> frameworkFailoverTimeout: 4320
>
> frameworkName: MyriadAlpha
>
> frameworkRole:
>
> frameworkUser: root # User the Node Manager runs as, required if
> nodeManagerURI set, otherwise defaults to the user
>
>  # running the resource manager.
>
> frameworkSuperUser: root  # To be depricated, currently permissions need
> set by a superuser due to Mesos-1790.  Must be
>
>  # root or have passwordless sudo. Required if
> nodeManagerURI set, ignored otherwise.
>
> nativeLibrary: /usr/local/lib/libmesos.so
>
> zkServers: myip:2181
>
> zkTimeout: 2
>
> restApiPort: 8192
>
> servedConfigPath: dist/config.tgz
>
> servedBinaryPath: dist/binary.tgz
>
> profiles:
>
> zero:  # NMs launched with this profile dynamically obtain cpu/mem from
> Mesos
>
>cpu: 0
>
>mem: 0
>
> small:
>
>cpu: 2
>
>mem: 2048
>
> medium:
>
>cpu: 4
>
>mem: 4096
>
> large:
>
>cpu: 10
>
>mem: 12288
>
> nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero
> profile.
>
> medium: 1 # 
>
> rebalancer: false
>
> haEnabled: false
>
> nodemanager:

Re: Resource manager error

2016-08-17 Thread Darin Johnson

Take a look at your myriad configuration under yarnEnvironment.  You can
set JAVA_HOME there, should solve the issue. See below.
yarnEnvironment:
YARN_HOME: /usr/local/hadoop
#HADOOP_CONF_DIR=config
#HADOOP_TMP_DIR=$MESOS_SANDBOX
#YARN_HOME: hadoop-2.7.0 #this should be relative if nodeManagerUri is set
#JAVA_HOME: /usr/lib/jvm/java-default #System dependent, but sometimes
necessary
#JAVA_HOME: jre1.7.0_76 # Path to JRE distribution, relative to sandbox
directory
#JAVA_LIBRARY_PATH: /opt/mycompany/lib

On Wed, Aug 17, 2016 at 3:13 PM, Matthew J. Loppatto <mloppa...@keywcorp.com
> wrote:

> I'm running the resource manager as the root user.  Checking a few of my
> nodes, JAVA_HOME is set on all of them for the root env.  Am I ok to be
> using openjdk1.7 or do I have to use Oracle jdk?
>
> Matt
>
> -Original Message-
> From: John Yost [mailto:hokiege...@gmail.com]
> Sent: Wednesday, August 17, 2016 3:01 PM
> To: dev@myriad.incubator.apache.org
> Subject: Re: Resource manager error
>
> Progress is nice! What user are you running myriad as? root? yarn? If it
> is the former and you are running via sudo, I've seen this type of error.
> If so, sudo to the root user and then launch. Otherwise, please type in env
> if you are on linux box and confirm you see JAVA_HOME for the user you are
> launching myriad as.
>
> --John
>
> On Wed, Aug 17, 2016 at 2:56 PM, Matthew J. Loppatto <
> mloppa...@keywcorp.com
> > wrote:
>
> > Hey John,
> >
> > I set up a role for myriad, restarted mesos-master, and now I'm seeing
> > RMs starting on the Mesos UI, but they fail with the message "lost
> > with exit
> > status: 256".  The executor log says "Error: JAVA_HOME is not set and
> > could not be found."  $JAVA_HOME is set on all my slaves as far as I'm
> aware.
> > Running `java -version` confirms openjdk 1.7.0_111.  Looks like its
> > close to a working state.  Am I missing something?
> >
> > Thanks!
> > Matt
> >
> > -Original Message-
> > From: John Yost [mailto:hokiege...@gmail.com]
> > Sent: Wednesday, August 17, 2016 2:38 PM
> > To: dev@myriad.incubator.apache.org
> > Subject: Re: Resource manager error
> >
> > Please uncomment frameworkRole and then add the name of whatever Mesos
> > role you have configured that is not *. Note: at the risk of telling
> > you something you already know, you define roles in
> /etc/mesos-master/roles.
> >
> > In the meantime, I opened up a JIRA ticket and gonna fix this ASAP
> > starting now! :)
> >
> > --John
> >
> > On Wed, Aug 17, 2016 at 2:23 PM, Matthew J. Loppatto <
> > mloppa...@keywcorp.com
> > > wrote:
> >
> > > Hey Darin,
> > >
> > > Commenting out myriadFrameworkRole got rid of the log message about
> > > the missing role, but I'm still seeing the "n must be positive"
> > exception.
> > >
> > > The only other thing of interest I see in the log is WARN fair.
> > AllocationFileLoaderService:
> > > fair-scheduler.xml not found on the classpath.  Not sure if that is
> > > causing any issue though.
> > >
> > > Matt
> > >
> > > -Original Message-
> > > From: Darin Johnson [mailto:dbjohnson1...@gmail.com]
> > > Sent: Wednesday, August 17, 2016 1:26 PM
> > > To: Dev
> > > Subject: Re: Resource manager error
> > >
> > > Hey Matt,
> > >
> > > Looking through the code, I think setting myriadFrameworkRole to "*"
> > > might be the problem.  Can you try commenting out that line in your
> > > config?  I'll double check this in a little while too.  If that
> > > works I'll submit a patch that checks that.
> > >
> > > Sorry - Myriad is still a pretty young project!  Thanks for checking
> > > it out though!
> > >
> > > Darin
> > >
> > > On Wed, Aug 17, 2016 at 11:25 AM, Matthew J. Loppatto <
> > > mloppa...@keywcorp.com> wrote:
> > >
> > > > Hey Darin,
> > > >
> > > > Pulling from master got rid of the errors I was seeing, however
> > > > I'm running into a new issue.  After starting the resource
> > > > manager, I see this in the logs:
> > > >
> > > > 2016-08-17 10:56:40,709 INFO org.apache.myriad.Main: Launching 1
> > > > NM(s) with profile medium
> > > > 2016-08-17 10:56:40,710 INFO org.apache.myriad.scheduler.
> > > MyriadOperations:
> > > > Adding 1 NM instances to cluster
> > &

Re: Do we have sync up today, or I am too late?

2016-08-24 Thread Darin Johnson

Adam and I showed up.  I'm willing to hop back on a chat if you want.

On Wed, Aug 24, 2016 at 12:21 PM, yuliya Feldman <
yufeld...@yahoo.com.invalid> wrote:

>
>

Re: [DISCUSS] handling roles in Myriad code

2016-10-28 Thread Darin Johnson

Any word from Adam or Mohit?

On Oct 20, 2016 12:36 AM, "Klaus Ma" <klaus1982...@gmail.com> wrote:

> I can help on this discussion; I used to be Mesos contributor for a year
> :).
>
> Mesos allocate regular resources based on role by DRF; and role is also
> used for reservation & quotas. So, the framework (like Myriad), may get two
> kind of resources: "*" or "myriad-s role". When Myriad launch tasks, it can
> not overuse any kind of resources: for example, if Myarid got offers:
> cpu(*):1;cpu(myriad):1, Myriad can not launch tasks by cpu(*):2 which will
> be rejected by Mesos master.
>
> Thanks
> Klaus
>
>
> On Thu, Oct 20, 2016 at 12:10 PM Yuliya <yufeld...@yahoo.com.invalid>
> wrote:
>
> > I really would like Mesosphere guys to comment here. I had a chat with
> > Adam today morning and I did not get the same impression
> >
> > Thanks,
> > Yuliya
> >
> > > On Oct 19, 2016, at 8:50 PM, Darin Johnson <dbjohnson1...@gmail.com>
> > wrote:
> > >
> > > We use roles extensively to ensure different frameworks can (or can't)
> > get
> > > resources via mechanisms such as reserved resorces and quotas.  Also if
> > you
> > > don't pay attention you can miss a lot of the resources you're given.
> I
> > > wish it was we didn't have to do all the book keeping our selves, but I
> > > suppose there are good reasons for delegating it to the framework, for
> > > instance we can choose when to fave a reserved vs a default resource.
> > >
> > > On Wed, Oct 19, 2016 at 11:30 PM, yuliya Feldman <
> > > yufeld...@yahoo.com.invalid> wrote:
> > >
> > >> I am not sure we should care about role being set or not, what if in
> the
> > >> future we will have multiple rolesNot even sure if presence/absence of
> > role
> > >> should play role (no pun intended :) ).
> > >>
> > >>  From: Darin Johnson <dbjohnson1...@gmail.com>
> > >> To: Dev <dev@myriad.incubator.apache.org>; yuliya Feldman <
> > >> yufeld...@yahoo.com>
> > >> Sent: Wednesday, October 19, 2016 7:17 PM
> > >> Subject: Re: [DISCUSS] handling roles in Myriad code
> > >>
> > >> Ah so if I understand correctly, if frameworkRole='*' is present in
> the
> > >> config, it's handled as thought it's the framework role.  I believe
> > when I
> > >> was testing I was using frameworkRole="test" or commenting out
> > >> frameworkRole="test".  It looks as though in MyriadConfiguration,
> > >> getFrameworkRole now returns "*" even if not set.
> > >>
> > >> Seems like we should be able to add a check like r.hasRole() &&
> > >> r.getRole().equals(role)
> > >> && !role.equals("*") in a few places. Though it may be better
> > >> to pass think about a better approach here.
> > >>
> > >> Darin
> > >>
> > >> On Wed, Oct 19, 2016 at 9:28 PM, yuliya Feldman
> > >> <yufeld...@yahoo.com.invalid
> > >>> wrote:
> > >>
> > >>> Hello Darrin,
> > >>> I kind of see the point regarding JHS ports. May be there is truth to
> > it.
> > >>> Regarding my issues with role/no role.
> > >>> I had this issue for NMs with random ports (not hardcoded), as it has
> > >>> different code path when role is present and when it is not. My
> > >> impression
> > >>> those are bugs.
> > >>> I am happy to point you to the places in the code that caused issues
> on
> > >>> master (at least for me).[1] does not increment numDefaultValues if
> > role
> > >> is
> > >>> set (which is always set), subsequently [2] has issues[3] same thing
> -
> > >>> fills out list only if there is no role, but again it is always
> there,
> > >> just
> > >>> set to "*"
> > >>>
> > >>>
> > >>> Regarding:>>> To handle nodemanager persistence I think we should
> work
> > >>> with Klaus's PR's to get thecorrect ports, though we'll need to use
> > some
> > >>> disk persistence as well to
> > >>> keep the NM state.
> > >>> Disk persistence won't help here (not even sure NM has much state to
> > >>> persist - even if it does it should be taken care by YARN), as
> > containers
> >

1 2 >

1 - 100 of 135 matches

Mail list logo