That's right On Tue, Sep 9, 2014 at 2:04 PM, Debasish Das <debasish.da...@gmail.com> wrote:
> Last time it did not show up on environment tab but I will give it another > shot...Expected behavior is that this env variable will show up right ? > > On Tue, Sep 9, 2014 at 12:15 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> I would expect 2 GB would be enough or more than enough for 16 GB >> executors (unless ALS is using a bunch of off-heap memory?). You mentioned >> earlier in this thread that the property wasn't showing up in the >> Environment tab. Are you sure it's making it in? >> >> -Sandy >> >> On Tue, Sep 9, 2014 at 11:58 AM, Debasish Das <debasish.da...@gmail.com> >> wrote: >> >>> Hmm...I did try it increase to few gb but did not get a successful run >>> yet... >>> >>> Any idea if I am using say 40 executors, each running 16GB, what's the >>> typical spark.yarn.executor.memoryOverhead for say 100M x 10 M large >>> matrices with say few billion ratings... >>> >>> On Tue, Sep 9, 2014 at 10:49 AM, Sandy Ryza <sandy.r...@cloudera.com> >>> wrote: >>> >>>> Hi Deb, >>>> >>>> The current state of the art is to increase >>>> spark.yarn.executor.memoryOverhead until the job stops failing. We do have >>>> plans to try to automatically scale this based on the amount of memory >>>> requested, but it will still just be a heuristic. >>>> >>>> -Sandy >>>> >>>> On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das <debasish.da...@gmail.com> >>>> wrote: >>>> >>>>> Hi Sandy, >>>>> >>>>> Any resolution for YARN failures ? It's a blocker for running spark on >>>>> top of YARN. >>>>> >>>>> Thanks. >>>>> Deb >>>>> >>>>> On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng <men...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Deb, >>>>>> >>>>>> I think this may be the same issue as described in >>>>>> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the >>>>>> container got killed by YARN because it used much more memory that it >>>>>> requested. But we haven't figured out the root cause yet. >>>>>> >>>>>> +Sandy >>>>>> >>>>>> Best, >>>>>> Xiangrui >>>>>> >>>>>> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das < >>>>>> debasish.da...@gmail.com> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > During the 4th ALS iteration, I am noticing that one of the >>>>>> executor gets >>>>>> > disconnected: >>>>>> > >>>>>> > 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding >>>>>> > SendingConnectionManagerId not found >>>>>> > >>>>>> > 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor >>>>>> 5 >>>>>> > disconnected, so removing it >>>>>> > >>>>>> > 14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost >>>>>> executor 5 >>>>>> > on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client >>>>>> disassociated >>>>>> > >>>>>> > 14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5 >>>>>> (epoch 12) >>>>>> > Any idea if this is a bug related to akka on YARN ? >>>>>> > >>>>>> > I am using master >>>>>> > >>>>>> > Thanks. >>>>>> > Deb >>>>>> >>>>> >>>>> >>>> >>> >> >