[ https://issues.apache.org/jira/browse/SPARK-13039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122166#comment-15122166 ]
Shixiong Zhu commented on SPARK-13039: -------------------------------------- May be killed by Mesos because exceeding the memory limitation, such as some memory leak in your app or Streaming. Could you check the Mesos log? > Spark Streaming with Mesos shutdown without any reason on logs > -------------------------------------------------------------- > > Key: SPARK-13039 > URL: https://issues.apache.org/jira/browse/SPARK-13039 > Project: Spark > Issue Type: Question > Components: Streaming > Affects Versions: 1.5.1 > Reporter: Luis Alves > Priority: Minor > > I've a Spark Application running with Mesos that is being killed (this > happens every 2 days). When I see the logs, this is what I have in the spark > driver: > {quote} > 16/01/27 05:24:24 INFO JobScheduler: Starting job streaming job 1453872264000 > ms.0 from job set of time 1453872264000 ms > 16/01/27 05:24:24 INFO JobScheduler: Added jobs for time 1453872264000 ms > 16/01/27 05:24:24 INFO SparkContext: Starting job: foreachRDD at > StreamingApplication.scala:59 > 16/01/27 05:24:24 INFO DAGScheduler: Got job 40085 (foreachRDD at > StreamingApplication.scala:59) with 1 output partitions > 16/01/27 05:24:24 INFO DAGScheduler: Final stage: ResultStage > 40085(foreachRDD at StreamingApplication.scala:59) > 16/01/27 05:24:24 INFO DAGScheduler: Parents of final stage: List() > 16/01/27 05:24:24 INFO DAGScheduler: Missing parents: List() > 16/01/27 05:24:24 INFO DAGScheduler: Submitting ResultStage 40085 > (MapPartitionsRDD[80171] at map at StreamingApplication.scala:59), which has > no missing parents > 16/01/27 05:24:24 INFO MemoryStore: ensureFreeSpace(4720) called with > curMem=147187, maxMem=560497950 > 16/01/27 05:24:24 INFO MemoryStore: Block broadcast_40085 stored as values in > memory (estimated size 4.6 KB, free 534.4 MB) > Killed > {quote} > And this is what I see in the spark slaves: > {quote} > 16/01/27 05:24:20 INFO BlockManager: Removing RDD 80167 > 16/01/27 05:24:20 INFO BlockManager: Removing RDD 80166 > 16/01/27 05:24:20 INFO BlockManager: Removing RDD 80166 > I0127 05:24:24.070618 11142 exec.cpp:381] Executor asked to shutdown > 16/01/27 05:24:24 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: > SIGTERM > 16/01/27 05:24:24 ERROR CoarseGrainedExecutorBackend: Driver > 10.241.10.13:51810 disassociated! Shutting down. > 16/01/27 05:24:24 INFO DiskBlockManager: Shutdown hook called > 16/01/27 05:24:24 WARN ReliableDeliverySupervisor: Association with remote > system [akka.tcp://sparkDriver@10.241.10.13:51810] has failed, address is now > gated for [5000] ms. Reason: [Disassociated] > 16/01/27 05:24:24 INFO ShutdownHookManager: Shutdown hook called > 16/01/27 05:24:24 INFO ShutdownHookManager: Deleting directory > /tmp/spark-f80464b5-1de2-461e-b78b-8ddbd077682a > {quote} > As you can see, this doesn't give any information about the reason why the > driver was killed. > The mesos version I'm using is 0.25.0. > How can I get more information about why it is being killed? > Curious fact: I also have a Spark Jobserver clustering running and without > any problem (same versions). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org