Re: Lost executor on YARN ALS iterations

Sandy Ryza Wed, 20 Aug 2014 00:40:56 -0700

Hi Debasish,

The fix is to raise spark.yarn.executor.memoryOverhead until this goes
away.  This controls the buffer between the JVM heap size and the amount of
memory requested from YARN (JVMs can take up memory beyond their heap
size). You should also make sure that, in the YARN NodeManager
configuration, yarn.nodemanager.vmem-check-enabled is set to false.


-Sandy


On Wed, Aug 20, 2014 at 12:27 AM, Debasish Das <debasish.da...@gmail.com>
wrote:

> I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is
> definitely a YARN related problem...
>
> At least for me right now only deployment option possible is standalone...
>
>
>
> On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
>> Hi Deb,
>>
>> I think this may be the same issue as described in
>> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
>> container got killed by YARN because it used much more memory that it
>> requested. But we haven't figured out the root cause yet.
>>
>> +Sandy
>>
>> Best,
>> Xiangrui
>>
>> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das <debasish.da...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > During the 4th ALS iteration, I am noticing that one of the executor
>> gets
>> > disconnected:
>> >
>> > 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
>> > SendingConnectionManagerId not found
>> >
>> > 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
>> > disconnected, so removing it
>> >
>> > 14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost
>> executor 5
>> > on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client disassociated
>> >
>> > 14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch
>> 12)
>> > Any idea if this is a bug related to akka on YARN ?
>> >
>> > I am using master
>> >
>> > Thanks.
>> > Deb
>>
>
>

Re: Lost executor on YARN ALS iterations

Reply via email to