Thanks Aditya, appreciate the help.

I had the exact thought about the huge number of executors requested.
I am going with the dynamic executors and not specifying the number of
executors. Are you suggesting that I should limit the number of executors
when the dynamic allocator requests for more number of executors.

Its a 12 node EMR cluster and has more than a Tb of memory.



On Fri, Sep 23, 2016 at 5:12 PM, Aditya <aditya.calangut...@augmentiq.co.in>
wrote:

> Hi Yash,
>
> What is your total cluster memory and number of cores?
> Problem might be with the number of executors you are allocating. The logs
> shows it as 168510 which is on very high side. Try reducing your executors.
>
>
> On Friday 23 September 2016 12:34 PM, Yash Sharma wrote:
>
>> Hi All,
>> I have a spark job which runs over a huge bulk of data with Dynamic
>> allocation enabled.
>> The job takes some 15 minutes to start up and fails as soon as it starts*.
>>
>> Is there anything I can check to debug this problem. There is not a lot
>> of information in logs for the exact cause but here is some snapshot below.
>>
>> Thanks All.
>>
>> * - by starts I mean when it shows something on the spark web ui, before
>> that its just blank page.
>>
>> Logs here -
>>
>> {code}
>> 16/09/23 06:33:19 INFO ApplicationMaster: Started progress reporter
>> thread with (heartbeat : 3000, initial allocation : 200) intervals
>> 16/09/23 06:33:27 INFO YarnAllocator: Driver requested a total number of
>> 168510 executor(s).
>> 16/09/23 06:33:27 INFO YarnAllocator: Will request 168510 executor
>> containers, each with 2 cores and 6758 MB memory including 614 MB overhead
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 22
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 19
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 18
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 12
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 11
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 20
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 15
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 7
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 8
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 16
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 21
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 6
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 13
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 14
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 9
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 3
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 17
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 1
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 10
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 4
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 2
>> 16/09/23 06:33:36 WARN YarnAllocator: Tried to get the loss reason for
>> non-existent executor 5
>> 16/09/23 06:33:36 WARN ApplicationMaster: Reporter thread fails 1 time(s)
>> in a row.
>> java.lang.StackOverflowError
>>         at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>>         at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>>         at scala.collection.TraversableLike$WithFilter$$anonfun$
>> foreach$1.apply(TraversableLike.scala:772)
>>         at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>>         at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>>         at scala.collection.TraversableLike$WithFilter$$anonfun$
>> foreach$1.apply(TraversableLike.scala:772)
>>         at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>>         at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>>         at scala.collection.TraversableLike$WithFilter$$anonfun$
>> foreach$1.apply(TraversableLike.scala:772)
>>         at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>>         at scala.collection.MapLike$MappedValues$$anonfun$foreach$3.
>> apply(MapLike.scala:245)
>> {code}
>>
>> ... <trimmed logs>
>>
>> {code}
>> 16/09/23 06:33:36 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:
>> Attempted to get executor loss reason for executor id 7 at RPC address ,
>> but got no response. Marking as slave lost.
>> org.apache.spark.SparkException: Fail to find loss reason for
>> non-existent executor 7
>>         at org.apache.spark.deploy.yarn.YarnAllocator.enqueueGetLossRea
>> sonRequest(YarnAllocator.scala:554)
>>         at org.apache.spark.deploy.yarn.ApplicationMaster$AMEndpoint$$a
>> nonfun$receiveAndReply$1.applyOrElse(ApplicationMaster.scala:632)
>>         at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$
>> mcV$sp(Inbox.scala:104)
>>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>>         at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>>         at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispat
>> cher.scala:215)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> {code}
>>
>
>
>
>
>

Reply via email to