Luis Alves created SPARK-13039:
----------------------------------
Summary: Spark Streaming with Mesos shutdown without any reason on
logs
Key: SPARK-13039
URL: https://issues.apache.org/jira/browse/SPARK-13039
Project: Spark
Issue Type: Question
Components: Streaming
Affects Versions: 1.5.1
Reporter: Luis Alves
Priority: Minor
I've a Spark Application running with Mesos that is being killed (this happens
every 2 days). When I see the logs, this is what I have in the spark driver:
{quote}
16/01/27 05:24:24 INFO JobScheduler: Starting job streaming job 1453872264000
ms.0 from job set of time 1453872264000 ms
16/01/27 05:24:24 INFO JobScheduler: Added jobs for time 1453872264000 ms
16/01/27 05:24:24 INFO SparkContext: Starting job: foreachRDD at
StreamingApplication.scala:59
16/01/27 05:24:24 INFO DAGScheduler: Got job 40085 (foreachRDD at
StreamingApplication.scala:59) with 1 output partitions
16/01/27 05:24:24 INFO DAGScheduler: Final stage: ResultStage 40085(foreachRDD
at StreamingApplication.scala:59)
16/01/27 05:24:24 INFO DAGScheduler: Parents of final stage: List()
16/01/27 05:24:24 INFO DAGScheduler: Missing parents: List()
16/01/27 05:24:24 INFO DAGScheduler: Submitting ResultStage 40085
(MapPartitionsRDD[80171] at map at StreamingApplication.scala:59), which has no
missing parents
16/01/27 05:24:24 INFO MemoryStore: ensureFreeSpace(4720) called with
curMem=147187, maxMem=560497950
16/01/27 05:24:24 INFO MemoryStore: Block broadcast_40085 stored as values in
memory (estimated size 4.6 KB, free 534.4 MB)
Killed
{quote}
And this is what I see in the spark slaves:
{quote}
16/01/27 05:24:20 INFO BlockManager: Removing RDD 80167
16/01/27 05:24:20 INFO BlockManager: Removing RDD 80166
16/01/27 05:24:20 INFO BlockManager: Removing RDD 80166
I0127 05:24:24.070618 11142 exec.cpp:381] Executor asked to shutdown
16/01/27 05:24:24 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15:
SIGTERM
16/01/27 05:24:24 ERROR CoarseGrainedExecutorBackend: Driver 10.241.10.13:51810
disassociated! Shutting down.
16/01/27 05:24:24 INFO DiskBlockManager: Shutdown hook called
16/01/27 05:24:24 WARN ReliableDeliverySupervisor: Association with remote
system [akka.tcp://[email protected]:51810] has failed, address is now
gated for [5000] ms. Reason: [Disassociated]
16/01/27 05:24:24 INFO ShutdownHookManager: Shutdown hook called
16/01/27 05:24:24 INFO ShutdownHookManager: Deleting directory
/tmp/spark-f80464b5-1de2-461e-b78b-8ddbd077682a
{quote}
As you can see, this doesn't give any information about the reason why the
driver was killed.
The mesos version I'm using is 0.25.0.
How can I get more information about why it is being killed?
Curious fact: I also have a Spark Jobserver clustering running and without any
problem (same versions).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]