Hmm...I did try it increase to few gb but did not get a successful run yet...
Any idea if I am using say 40 executors, each running 16GB, what's the typical spark.yarn.executor.memoryOverhead for say 100M x 10 M large matrices with say few billion ratings... On Tue, Sep 9, 2014 at 10:49 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > Hi Deb, > > The current state of the art is to increase > spark.yarn.executor.memoryOverhead until the job stops failing. We do have > plans to try to automatically scale this based on the amount of memory > requested, but it will still just be a heuristic. > > -Sandy > > On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das <debasish.da...@gmail.com> > wrote: > >> Hi Sandy, >> >> Any resolution for YARN failures ? It's a blocker for running spark on >> top of YARN. >> >> Thanks. >> Deb >> >> On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >>> Hi Deb, >>> >>> I think this may be the same issue as described in >>> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the >>> container got killed by YARN because it used much more memory that it >>> requested. But we haven't figured out the root cause yet. >>> >>> +Sandy >>> >>> Best, >>> Xiangrui >>> >>> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das <debasish.da...@gmail.com> >>> wrote: >>> > Hi, >>> > >>> > During the 4th ALS iteration, I am noticing that one of the executor >>> gets >>> > disconnected: >>> > >>> > 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding >>> > SendingConnectionManagerId not found >>> > >>> > 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5 >>> > disconnected, so removing it >>> > >>> > 14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost >>> executor 5 >>> > on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client disassociated >>> > >>> > 14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch >>> 12) >>> > Any idea if this is a bug related to akka on YARN ? >>> > >>> > I am using master >>> > >>> > Thanks. >>> > Deb >>> >> >> >