Re: Lost executor on YARN ALS iterations

Sandy Ryza Tue, 09 Sep 2014 10:49:53 -0700

Hi Deb,

The current state of the art is to increase
spark.yarn.executor.memoryOverhead until the job stops failing.  We do have
plans to try to automatically scale this based on the amount of memory
requested, but it will still just be a heuristic.


-Sandy

On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das <[email protected]>
wrote:

> Hi Sandy,
>
> Any resolution for YARN failures ? It's a blocker for running spark on top
> of YARN.
>
> Thanks.
> Deb
>
> On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng <[email protected]> wrote:
>
>> Hi Deb,
>>
>> I think this may be the same issue as described in
>> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
>> container got killed by YARN because it used much more memory that it
>> requested. But we haven't figured out the root cause yet.
>>
>> +Sandy
>>
>> Best,
>> Xiangrui
>>
>> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das <[email protected]>
>> wrote:
>> > Hi,
>> >
>> > During the 4th ALS iteration, I am noticing that one of the executor
>> gets
>> > disconnected:
>> >
>> > 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
>> > SendingConnectionManagerId not found
>> >
>> > 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
>> > disconnected, so removing it
>> >
>> > 14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost
>> executor 5
>> > on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client disassociated
>> >
>> > 14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch
>> 12)
>> > Any idea if this is a bug related to akka on YARN ?
>> >
>> > I am using master
>> >
>> > Thanks.
>> > Deb
>>
>
>

Re: Lost executor on YARN ALS iterations

Reply via email to