[ 
https://issues.apache.org/jira/browse/SPARK-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ash updated SPARK-2586:
------------------------------
    Description: 
When you running Spark with Tachyon, when the connection to Tachyon master is 
down (due to problem in network or the Master node is down) there is no clear 
log or error message to indicate it.

Here is sample stack running SparkTachyonPi example with Tachyon connecting:

{noformat}
14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to a 
loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5)
14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra
14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hsaputra)
14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started
14/07/15 16:43:11 INFO Remoting: Starting remoting
14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203]
14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203]
14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker
14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster
14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at 
/var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-local-20140715164311-e63c
14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id = 
ConnectionManagerId(office-5-148.pa.gopivotal.com,53204)
14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager
14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM
14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager
14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server
14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at 
http://10.64.5.148:53205
14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is 
/var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e
14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server
14/07/15 16:43:12 INFO SparkUI: Started SparkUI at 
http://office-5-148.pa.gopivotal.com:4040
2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from 
SCDynamicStore
14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/07/15 16:43:12 INFO SparkContext: Added JAR 
examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at 
http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar 
with timestamp 1405467792813
14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master 
spark://henry-pivotal.local:7077...
14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at 
SparkTachyonPi.scala:43
14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at 
SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false)
14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at 
SparkTachyonPi.scala:43)
14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List()
14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List()
14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at 
SparkTachyonPi.scala:39), which has no missing parents
14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
(MappedRDD[1] at map at SparkTachyonPi.scala:39)
14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster 
with app ID app-20140715164313-0000
14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: 
app-20140715164313-0000/0 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164313-0000/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:43:13 INFO AppClient$ClientActor: Executor updated: 
app-20140715164313-0000/0 is now RUNNING
14/07/15 16:43:15 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:53213/user/Executor#-423405256]
 with ID 0
14/07/15 16:43:15 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 
0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:43:15 INFO TaskSetManager: Serialized task 0.0:0 as 1428 bytes in 3 
ms
14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 
0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:43:15 INFO TaskSetManager: Serialized task 0.0:1 as 1428 bytes in 1 
ms
14/07/15 16:43:15 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:53218 with 294.9 MB RAM
14/07/15 16:43:16 INFO BlockManagerInfo: Added rdd_0_1 on tachyon on 
office-5-148.pa.gopivotal.com:53218 (size: 977.2 KB)
14/07/15 16:43:16 INFO BlockManagerInfo: Added rdd_0_0 on tachyon on 
office-5-148.pa.gopivotal.com:53218 (size: 977.2 KB)
14/07/15 16:43:16 INFO TaskSetManager: Finished TID 0 in 1307 ms on 
office-5-148.pa.gopivotal.com (progress: 1/2)
14/07/15 16:43:16 INFO TaskSetManager: Finished TID 1 in 1300 ms on 
office-5-148.pa.gopivotal.com (progress: 2/2)
14/07/15 16:43:16 INFO DAGScheduler: Completed ResultTask(0, 0)
14/07/15 16:43:16 INFO DAGScheduler: Completed ResultTask(0, 1)
14/07/15 16:43:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
14/07/15 16:43:16 INFO DAGScheduler: Stage 0 (reduce at 
SparkTachyonPi.scala:43) finished in 3.336 s
14/07/15 16:43:16 INFO SparkContext: Job finished: reduce at 
SparkTachyonPi.scala:43, took 3.413498 s
Pi is roughly 3.14254
14/07/15 16:43:16 INFO SparkUI: Stopped Spark web UI at 
http://office-5-148.pa.gopivotal.com:4040
14/07/15 16:43:16 INFO DAGScheduler: Stopping DAGScheduler
14/07/15 16:43:16 INFO SparkDeploySchedulerBackend: Shutting down all executors
14/07/15 16:43:16 INFO SparkDeploySchedulerBackend: Asking each executor to 
shut down
14/07/15 16:43:17 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor 
stopped!
14/07/15 16:43:17 INFO ConnectionManager: Selector thread was interrupted!
14/07/15 16:43:17 INFO ConnectionManager: ConnectionManager stopped
14/07/15 16:43:17 INFO MemoryStore: MemoryStore cleared
14/07/15 16:43:17 INFO BlockManager: BlockManager stopped
14/07/15 16:43:17 INFO BlockManagerMasterActor: Stopping BlockManagerMaster
14/07/15 16:43:17 INFO BlockManagerMaster: BlockManagerMaster stopped
14/07/15 16:43:17 INFO SparkContext: Successfully stopped SparkContext
14/07/15 16:43:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down 
remote daemon.
14/07/15 16:43:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon 
shut down; proceeding with flushing remote transports.

Process finished with exit code 0

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


And here is the stack when Tachyon cannot be reached:


14/07/15 16:49:17 INFO Utils: Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
14/07/15 16:49:17 WARN Utils: Your hostname, henry-pivotal.local resolves to a 
loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5)
14/07/15 16:49:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
14/07/15 16:49:17 INFO SecurityManager: Changing view acls to: hsaputra
14/07/15 16:49:17 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hsaputra)
14/07/15 16:49:17 INFO Slf4jLogger: Slf4jLogger started
14/07/15 16:49:17 INFO Remoting: Starting remoting
14/07/15 16:49:17 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sp...@office-5-148.pa.gopivotal.com:54541]
14/07/15 16:49:17 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://sp...@office-5-148.pa.gopivotal.com:54541]
14/07/15 16:49:17 INFO SparkEnv: Registering MapOutputTracker
14/07/15 16:49:17 INFO SparkEnv: Registering BlockManagerMaster
14/07/15 16:49:17 INFO DiskBlockManager: Created local directory at 
/var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-local-20140715164917-bf9e
14/07/15 16:49:17 INFO ConnectionManager: Bound socket to port 54542 with id = 
ConnectionManagerId(office-5-148.pa.gopivotal.com,54542)
14/07/15 16:49:17 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
14/07/15 16:49:17 INFO BlockManagerMaster: Trying to register BlockManager
14/07/15 16:49:17 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54542 with 2.1 GB RAM
14/07/15 16:49:17 INFO BlockManagerMaster: Registered BlockManager
14/07/15 16:49:17 INFO HttpServer: Starting HTTP Server
14/07/15 16:49:17 INFO HttpBroadcast: Broadcast server started at 
http://10.64.5.148:54543
14/07/15 16:49:17 INFO HttpFileServer: HTTP File server directory is 
/var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-400178c7-8c6e-4e44-9610-926bd1f84877
14/07/15 16:49:17 INFO HttpServer: Starting HTTP Server
14/07/15 16:49:18 INFO SparkUI: Started SparkUI at 
http://office-5-148.pa.gopivotal.com:4040
2014-07-15 16:49:18.144 java[39346:1903] Unable to load realm info from 
SCDynamicStore
14/07/15 16:49:18 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/07/15 16:49:18 INFO SparkContext: Added JAR 
examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at 
http://10.64.5.148:54544/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar 
with timestamp 1405468158551
14/07/15 16:49:18 INFO AppClient$ClientActor: Connecting to master 
spark://henry-pivotal.local:7077...
14/07/15 16:49:18 INFO SparkContext: Starting job: reduce at 
SparkTachyonPi.scala:43
14/07/15 16:49:18 INFO DAGScheduler: Got job 0 (reduce at 
SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false)
14/07/15 16:49:18 INFO DAGScheduler: Final stage: Stage 0(reduce at 
SparkTachyonPi.scala:43)
14/07/15 16:49:18 INFO DAGScheduler: Parents of final stage: List()
14/07/15 16:49:18 INFO DAGScheduler: Missing parents: List()
14/07/15 16:49:18 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at 
SparkTachyonPi.scala:39), which has no missing parents
14/07/15 16:49:18 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
(MappedRDD[1] at map at SparkTachyonPi.scala:39)
14/07/15 16:49:18 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/07/15 16:49:18 INFO SparkDeploySchedulerBackend: Connected to Spark cluster 
with app ID app-20140715164918-0001
14/07/15 16:49:18 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/0 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:18 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:18 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/0 is now RUNNING
14/07/15 16:49:20 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54548/user/Executor#-221675010]
 with ID 0
14/07/15 16:49:20 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:49:20 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 
0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:20 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 1 
ms
14/07/15 16:49:20 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 
0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:20 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 0 
ms
14/07/15 16:49:20 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54553 with 294.9 MB RAM
14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Executor 0 disconnected, so 
removing it
14/07/15 16:49:26 ERROR TaskSchedulerImpl: Lost executor 0 on 
office-5-148.pa.gopivotal.com: remote Akka client disassociated
14/07/15 16:49:26 INFO TaskSetManager: Re-queueing tasks for 0 from TaskSet 0.0
14/07/15 16:49:26 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/0 is now EXITED (Command exited with code 55)
14/07/15 16:49:26 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Executor 
app-20140715164918-0001/0 removed: Command exited with code 55
14/07/15 16:49:26 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/07/15 16:49:26 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/1 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/1 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:26 INFO DAGScheduler: Executor lost: 0 (epoch 0)
14/07/15 16:49:26 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/1 is now RUNNING
14/07/15 16:49:26 INFO BlockManagerMasterActor: Trying to remove executor 0 
from BlockManagerMaster.
14/07/15 16:49:26 INFO BlockManagerMaster: Removed 0 successfully in 
removeExecutor
14/07/15 16:49:28 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54573/user/Executor#1564333236]
 with ID 1
14/07/15 16:49:28 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:49:28 INFO TaskSetManager: Starting task 0.0:0 as TID 2 on executor 
1: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:28 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 0 
ms
14/07/15 16:49:28 INFO TaskSetManager: Starting task 0.0:1 as TID 3 on executor 
1: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:28 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 0 
ms
14/07/15 16:49:28 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54578 with 294.9 MB RAM
14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Executor 1 disconnected, so 
removing it
14/07/15 16:49:34 ERROR TaskSchedulerImpl: Lost executor 1 on 
office-5-148.pa.gopivotal.com: remote Akka client disassociated
14/07/15 16:49:34 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0
14/07/15 16:49:34 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
14/07/15 16:49:34 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
14/07/15 16:49:34 INFO DAGScheduler: Executor lost: 1 (epoch 1)
14/07/15 16:49:34 INFO BlockManagerMasterActor: Trying to remove executor 1 
from BlockManagerMaster.
14/07/15 16:49:34 INFO BlockManagerMaster: Removed 1 successfully in 
removeExecutor
14/07/15 16:49:34 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/1 is now EXITED (Command exited with code 55)
14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Executor 
app-20140715164918-0001/1 removed: Command exited with code 55
14/07/15 16:49:34 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/2 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/2 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:34 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/2 is now RUNNING
14/07/15 16:49:37 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54599/user/Executor#-557403228]
 with ID 2
14/07/15 16:49:37 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:49:37 INFO TaskSetManager: Starting task 0.0:1 as TID 4 on executor 
2: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:37 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 1 
ms
14/07/15 16:49:37 INFO TaskSetManager: Starting task 0.0:0 as TID 5 on executor 
2: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:37 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 0 
ms
14/07/15 16:49:37 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54604 with 294.9 MB RAM
14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Executor 2 disconnected, so 
removing it
14/07/15 16:49:43 ERROR TaskSchedulerImpl: Lost executor 2 on 
office-5-148.pa.gopivotal.com: remote Akka client disassociated
14/07/15 16:49:43 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 0.0
14/07/15 16:49:43 WARN TaskSetManager: Lost TID 5 (task 0.0:0)
14/07/15 16:49:43 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
14/07/15 16:49:43 INFO DAGScheduler: Executor lost: 2 (epoch 2)
14/07/15 16:49:43 INFO BlockManagerMasterActor: Trying to remove executor 2 
from BlockManagerMaster.
14/07/15 16:49:43 INFO BlockManagerMaster: Removed 2 successfully in 
removeExecutor
14/07/15 16:49:43 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/2 is now EXITED (Command exited with code 55)
14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Executor 
app-20140715164918-0001/2 removed: Command exited with code 55
14/07/15 16:49:43 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/3 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/3 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:43 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/3 is now RUNNING
14/07/15 16:49:45 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54627/user/Executor#-1697612197]
 with ID 3
14/07/15 16:49:45 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:49:45 INFO TaskSetManager: Starting task 0.0:1 as TID 6 on executor 
3: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:45 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 0 
ms
14/07/15 16:49:45 INFO TaskSetManager: Starting task 0.0:0 as TID 7 on executor 
3: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:45 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 0 
ms
14/07/15 16:49:45 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54634 with 294.9 MB RAM
14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: Executor 3 disconnected, so 
removing it
14/07/15 16:49:51 ERROR TaskSchedulerImpl: Lost executor 3 on 
office-5-148.pa.gopivotal.com: remote Akka client disassociated
14/07/15 16:49:51 INFO TaskSetManager: Re-queueing tasks for 3 from TaskSet 0.0
14/07/15 16:49:51 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
14/07/15 16:49:51 ERROR TaskSetManager: Task 0.0:0 failed 4 times; aborting job
14/07/15 16:49:51 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
14/07/15 16:49:51 INFO TaskSchedulerImpl: Cancelling stage 0
14/07/15 16:49:51 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/3 is now EXITED (Command exited with code 55)
14/07/15 16:49:51 INFO DAGScheduler: Failed to run reduce at 
SparkTachyonPi.scala:43
Exception in thread "main" 14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: 
Executor app-20140715164918-0001/3 removed: Command exited with code 55
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 
failed 4 times, most recent failure: TID 7 on host 
office-5-148.pa.gopivotal.com failed for unknown reason
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1046)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1030)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1028)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:632)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:632)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:632)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/07/15 16:49:51 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/4 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/4 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:51 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/4 is now RUNNING
14/07/15 16:49:51 INFO DAGScheduler: Executor lost: 3 (epoch 3)
14/07/15 16:49:51 INFO BlockManagerMasterActor: Trying to remove executor 3 
from BlockManagerMaster.
14/07/15 16:49:51 INFO BlockManagerMaster: Removed 3 successfully in 
removeExecutor

Process finished with exit code 1
{noformat}

  was:
When you running Spark with Tachyon, when the connection to Tachyon master is 
down (due to problem in network or the Master node is down) there is no clear 
log or error message to indicate it.

Here is sample stack running SparkTachyonPi example with Tachyon connecting:

14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to a 
loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5)
14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra
14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hsaputra)
14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started
14/07/15 16:43:11 INFO Remoting: Starting remoting
14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203]
14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203]
14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker
14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster
14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at 
/var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-local-20140715164311-e63c
14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id = 
ConnectionManagerId(office-5-148.pa.gopivotal.com,53204)
14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager
14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM
14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager
14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server
14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at 
http://10.64.5.148:53205
14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is 
/var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e
14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server
14/07/15 16:43:12 INFO SparkUI: Started SparkUI at 
http://office-5-148.pa.gopivotal.com:4040
2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from 
SCDynamicStore
14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/07/15 16:43:12 INFO SparkContext: Added JAR 
examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at 
http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar 
with timestamp 1405467792813
14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master 
spark://henry-pivotal.local:7077...
14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at 
SparkTachyonPi.scala:43
14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at 
SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false)
14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at 
SparkTachyonPi.scala:43)
14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List()
14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List()
14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at 
SparkTachyonPi.scala:39), which has no missing parents
14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
(MappedRDD[1] at map at SparkTachyonPi.scala:39)
14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster 
with app ID app-20140715164313-0000
14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: 
app-20140715164313-0000/0 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164313-0000/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:43:13 INFO AppClient$ClientActor: Executor updated: 
app-20140715164313-0000/0 is now RUNNING
14/07/15 16:43:15 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:53213/user/Executor#-423405256]
 with ID 0
14/07/15 16:43:15 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 
0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:43:15 INFO TaskSetManager: Serialized task 0.0:0 as 1428 bytes in 3 
ms
14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 
0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:43:15 INFO TaskSetManager: Serialized task 0.0:1 as 1428 bytes in 1 
ms
14/07/15 16:43:15 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:53218 with 294.9 MB RAM
14/07/15 16:43:16 INFO BlockManagerInfo: Added rdd_0_1 on tachyon on 
office-5-148.pa.gopivotal.com:53218 (size: 977.2 KB)
14/07/15 16:43:16 INFO BlockManagerInfo: Added rdd_0_0 on tachyon on 
office-5-148.pa.gopivotal.com:53218 (size: 977.2 KB)
14/07/15 16:43:16 INFO TaskSetManager: Finished TID 0 in 1307 ms on 
office-5-148.pa.gopivotal.com (progress: 1/2)
14/07/15 16:43:16 INFO TaskSetManager: Finished TID 1 in 1300 ms on 
office-5-148.pa.gopivotal.com (progress: 2/2)
14/07/15 16:43:16 INFO DAGScheduler: Completed ResultTask(0, 0)
14/07/15 16:43:16 INFO DAGScheduler: Completed ResultTask(0, 1)
14/07/15 16:43:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
14/07/15 16:43:16 INFO DAGScheduler: Stage 0 (reduce at 
SparkTachyonPi.scala:43) finished in 3.336 s
14/07/15 16:43:16 INFO SparkContext: Job finished: reduce at 
SparkTachyonPi.scala:43, took 3.413498 s
Pi is roughly 3.14254
14/07/15 16:43:16 INFO SparkUI: Stopped Spark web UI at 
http://office-5-148.pa.gopivotal.com:4040
14/07/15 16:43:16 INFO DAGScheduler: Stopping DAGScheduler
14/07/15 16:43:16 INFO SparkDeploySchedulerBackend: Shutting down all executors
14/07/15 16:43:16 INFO SparkDeploySchedulerBackend: Asking each executor to 
shut down
14/07/15 16:43:17 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor 
stopped!
14/07/15 16:43:17 INFO ConnectionManager: Selector thread was interrupted!
14/07/15 16:43:17 INFO ConnectionManager: ConnectionManager stopped
14/07/15 16:43:17 INFO MemoryStore: MemoryStore cleared
14/07/15 16:43:17 INFO BlockManager: BlockManager stopped
14/07/15 16:43:17 INFO BlockManagerMasterActor: Stopping BlockManagerMaster
14/07/15 16:43:17 INFO BlockManagerMaster: BlockManagerMaster stopped
14/07/15 16:43:17 INFO SparkContext: Successfully stopped SparkContext
14/07/15 16:43:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down 
remote daemon.
14/07/15 16:43:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon 
shut down; proceeding with flushing remote transports.

Process finished with exit code 0

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


And here is the stack when Tachyon cannot be reached:


14/07/15 16:49:17 INFO Utils: Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
14/07/15 16:49:17 WARN Utils: Your hostname, henry-pivotal.local resolves to a 
loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5)
14/07/15 16:49:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
14/07/15 16:49:17 INFO SecurityManager: Changing view acls to: hsaputra
14/07/15 16:49:17 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hsaputra)
14/07/15 16:49:17 INFO Slf4jLogger: Slf4jLogger started
14/07/15 16:49:17 INFO Remoting: Starting remoting
14/07/15 16:49:17 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sp...@office-5-148.pa.gopivotal.com:54541]
14/07/15 16:49:17 INFO Remoting: Remoting now listens on addresses: 
[akka.tcp://sp...@office-5-148.pa.gopivotal.com:54541]
14/07/15 16:49:17 INFO SparkEnv: Registering MapOutputTracker
14/07/15 16:49:17 INFO SparkEnv: Registering BlockManagerMaster
14/07/15 16:49:17 INFO DiskBlockManager: Created local directory at 
/var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-local-20140715164917-bf9e
14/07/15 16:49:17 INFO ConnectionManager: Bound socket to port 54542 with id = 
ConnectionManagerId(office-5-148.pa.gopivotal.com,54542)
14/07/15 16:49:17 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
14/07/15 16:49:17 INFO BlockManagerMaster: Trying to register BlockManager
14/07/15 16:49:17 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54542 with 2.1 GB RAM
14/07/15 16:49:17 INFO BlockManagerMaster: Registered BlockManager
14/07/15 16:49:17 INFO HttpServer: Starting HTTP Server
14/07/15 16:49:17 INFO HttpBroadcast: Broadcast server started at 
http://10.64.5.148:54543
14/07/15 16:49:17 INFO HttpFileServer: HTTP File server directory is 
/var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-400178c7-8c6e-4e44-9610-926bd1f84877
14/07/15 16:49:17 INFO HttpServer: Starting HTTP Server
14/07/15 16:49:18 INFO SparkUI: Started SparkUI at 
http://office-5-148.pa.gopivotal.com:4040
2014-07-15 16:49:18.144 java[39346:1903] Unable to load realm info from 
SCDynamicStore
14/07/15 16:49:18 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/07/15 16:49:18 INFO SparkContext: Added JAR 
examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at 
http://10.64.5.148:54544/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar 
with timestamp 1405468158551
14/07/15 16:49:18 INFO AppClient$ClientActor: Connecting to master 
spark://henry-pivotal.local:7077...
14/07/15 16:49:18 INFO SparkContext: Starting job: reduce at 
SparkTachyonPi.scala:43
14/07/15 16:49:18 INFO DAGScheduler: Got job 0 (reduce at 
SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false)
14/07/15 16:49:18 INFO DAGScheduler: Final stage: Stage 0(reduce at 
SparkTachyonPi.scala:43)
14/07/15 16:49:18 INFO DAGScheduler: Parents of final stage: List()
14/07/15 16:49:18 INFO DAGScheduler: Missing parents: List()
14/07/15 16:49:18 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at 
SparkTachyonPi.scala:39), which has no missing parents
14/07/15 16:49:18 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
(MappedRDD[1] at map at SparkTachyonPi.scala:39)
14/07/15 16:49:18 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/07/15 16:49:18 INFO SparkDeploySchedulerBackend: Connected to Spark cluster 
with app ID app-20140715164918-0001
14/07/15 16:49:18 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/0 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:18 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/0 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:18 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/0 is now RUNNING
14/07/15 16:49:20 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54548/user/Executor#-221675010]
 with ID 0
14/07/15 16:49:20 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:49:20 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 
0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:20 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 1 
ms
14/07/15 16:49:20 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 
0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:20 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 0 
ms
14/07/15 16:49:20 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54553 with 294.9 MB RAM
14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Executor 0 disconnected, so 
removing it
14/07/15 16:49:26 ERROR TaskSchedulerImpl: Lost executor 0 on 
office-5-148.pa.gopivotal.com: remote Akka client disassociated
14/07/15 16:49:26 INFO TaskSetManager: Re-queueing tasks for 0 from TaskSet 0.0
14/07/15 16:49:26 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/0 is now EXITED (Command exited with code 55)
14/07/15 16:49:26 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Executor 
app-20140715164918-0001/0 removed: Command exited with code 55
14/07/15 16:49:26 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/07/15 16:49:26 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/1 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/1 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:26 INFO DAGScheduler: Executor lost: 0 (epoch 0)
14/07/15 16:49:26 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/1 is now RUNNING
14/07/15 16:49:26 INFO BlockManagerMasterActor: Trying to remove executor 0 
from BlockManagerMaster.
14/07/15 16:49:26 INFO BlockManagerMaster: Removed 0 successfully in 
removeExecutor
14/07/15 16:49:28 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54573/user/Executor#1564333236]
 with ID 1
14/07/15 16:49:28 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:49:28 INFO TaskSetManager: Starting task 0.0:0 as TID 2 on executor 
1: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:28 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 0 
ms
14/07/15 16:49:28 INFO TaskSetManager: Starting task 0.0:1 as TID 3 on executor 
1: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:28 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 0 
ms
14/07/15 16:49:28 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54578 with 294.9 MB RAM
14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Executor 1 disconnected, so 
removing it
14/07/15 16:49:34 ERROR TaskSchedulerImpl: Lost executor 1 on 
office-5-148.pa.gopivotal.com: remote Akka client disassociated
14/07/15 16:49:34 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0
14/07/15 16:49:34 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
14/07/15 16:49:34 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
14/07/15 16:49:34 INFO DAGScheduler: Executor lost: 1 (epoch 1)
14/07/15 16:49:34 INFO BlockManagerMasterActor: Trying to remove executor 1 
from BlockManagerMaster.
14/07/15 16:49:34 INFO BlockManagerMaster: Removed 1 successfully in 
removeExecutor
14/07/15 16:49:34 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/1 is now EXITED (Command exited with code 55)
14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Executor 
app-20140715164918-0001/1 removed: Command exited with code 55
14/07/15 16:49:34 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/2 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/2 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:34 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/2 is now RUNNING
14/07/15 16:49:37 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54599/user/Executor#-557403228]
 with ID 2
14/07/15 16:49:37 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:49:37 INFO TaskSetManager: Starting task 0.0:1 as TID 4 on executor 
2: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:37 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 1 
ms
14/07/15 16:49:37 INFO TaskSetManager: Starting task 0.0:0 as TID 5 on executor 
2: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:37 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 0 
ms
14/07/15 16:49:37 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54604 with 294.9 MB RAM
14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Executor 2 disconnected, so 
removing it
14/07/15 16:49:43 ERROR TaskSchedulerImpl: Lost executor 2 on 
office-5-148.pa.gopivotal.com: remote Akka client disassociated
14/07/15 16:49:43 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 0.0
14/07/15 16:49:43 WARN TaskSetManager: Lost TID 5 (task 0.0:0)
14/07/15 16:49:43 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
14/07/15 16:49:43 INFO DAGScheduler: Executor lost: 2 (epoch 2)
14/07/15 16:49:43 INFO BlockManagerMasterActor: Trying to remove executor 2 
from BlockManagerMaster.
14/07/15 16:49:43 INFO BlockManagerMaster: Removed 2 successfully in 
removeExecutor
14/07/15 16:49:43 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/2 is now EXITED (Command exited with code 55)
14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Executor 
app-20140715164918-0001/2 removed: Command exited with code 55
14/07/15 16:49:43 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/3 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/3 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:43 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/3 is now RUNNING
14/07/15 16:49:45 INFO SparkDeploySchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54627/user/Executor#-1697612197]
 with ID 3
14/07/15 16:49:45 INFO TaskSetManager: Re-computing pending task lists.
14/07/15 16:49:45 INFO TaskSetManager: Starting task 0.0:1 as TID 6 on executor 
3: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:45 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 0 
ms
14/07/15 16:49:45 INFO TaskSetManager: Starting task 0.0:0 as TID 7 on executor 
3: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
14/07/15 16:49:45 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 0 
ms
14/07/15 16:49:45 INFO BlockManagerMasterActor: Registering block manager 
office-5-148.pa.gopivotal.com:54634 with 294.9 MB RAM
14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: Executor 3 disconnected, so 
removing it
14/07/15 16:49:51 ERROR TaskSchedulerImpl: Lost executor 3 on 
office-5-148.pa.gopivotal.com: remote Akka client disassociated
14/07/15 16:49:51 INFO TaskSetManager: Re-queueing tasks for 3 from TaskSet 0.0
14/07/15 16:49:51 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
14/07/15 16:49:51 ERROR TaskSetManager: Task 0.0:0 failed 4 times; aborting job
14/07/15 16:49:51 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have 
all completed, from pool 
14/07/15 16:49:51 INFO TaskSchedulerImpl: Cancelling stage 0
14/07/15 16:49:51 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/3 is now EXITED (Command exited with code 55)
14/07/15 16:49:51 INFO DAGScheduler: Failed to run reduce at 
SparkTachyonPi.scala:43
Exception in thread "main" 14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: 
Executor app-20140715164918-0001/3 removed: Command exited with code 55
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 
failed 4 times, most recent failure: TID 7 on host 
office-5-148.pa.gopivotal.com failed for unknown reason
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1046)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1030)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1028)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:632)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:632)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:632)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/07/15 16:49:51 INFO AppClient$ClientActor: Executor added: 
app-20140715164918-0001/4 on 
worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
(office-5-148.pa.gopivotal.com:52519) with 8 cores
14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20140715164918-0001/4 on hostPort office-5-148.pa.gopivotal.com:52519 with 
8 cores, 512.0 MB RAM
14/07/15 16:49:51 INFO AppClient$ClientActor: Executor updated: 
app-20140715164918-0001/4 is now RUNNING
14/07/15 16:49:51 INFO DAGScheduler: Executor lost: 3 (epoch 3)
14/07/15 16:49:51 INFO BlockManagerMasterActor: Trying to remove executor 3 
from BlockManagerMaster.
14/07/15 16:49:51 INFO BlockManagerMaster: Removed 3 successfully in 
removeExecutor

Process finished with exit code 1



> Lack of information to figure out connection to Tachyon master is inactive/ 
> down
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-2586
>                 URL: https://issues.apache.org/jira/browse/SPARK-2586
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Henry Saputra
>              Labels: tachyon
>
> When you running Spark with Tachyon, when the connection to Tachyon master is 
> down (due to problem in network or the Master node is down) there is no clear 
> log or error message to indicate it.
> Here is sample stack running SparkTachyonPi example with Tachyon connecting:
> {noformat}
> 14/07/15 16:43:10 INFO Utils: Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 14/07/15 16:43:10 WARN Utils: Your hostname, henry-pivotal.local resolves to 
> a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5)
> 14/07/15 16:43:10 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 14/07/15 16:43:11 INFO SecurityManager: Changing view acls to: hsaputra
> 14/07/15 16:43:11 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(hsaputra)
> 14/07/15 16:43:11 INFO Slf4jLogger: Slf4jLogger started
> 14/07/15 16:43:11 INFO Remoting: Starting remoting
> 14/07/15 16:43:11 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203]
> 14/07/15 16:43:11 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://sp...@office-5-148.pa.gopivotal.com:53203]
> 14/07/15 16:43:11 INFO SparkEnv: Registering MapOutputTracker
> 14/07/15 16:43:11 INFO SparkEnv: Registering BlockManagerMaster
> 14/07/15 16:43:11 INFO DiskBlockManager: Created local directory at 
> /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-local-20140715164311-e63c
> 14/07/15 16:43:11 INFO ConnectionManager: Bound socket to port 53204 with id 
> = ConnectionManagerId(office-5-148.pa.gopivotal.com,53204)
> 14/07/15 16:43:11 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
> 14/07/15 16:43:11 INFO BlockManagerMaster: Trying to register BlockManager
> 14/07/15 16:43:11 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:53204 with 2.1 GB RAM
> 14/07/15 16:43:11 INFO BlockManagerMaster: Registered BlockManager
> 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server
> 14/07/15 16:43:11 INFO HttpBroadcast: Broadcast server started at 
> http://10.64.5.148:53205
> 14/07/15 16:43:11 INFO HttpFileServer: HTTP File server directory is 
> /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-b2fb12ae-4608-4833-87b6-b335da00738e
> 14/07/15 16:43:11 INFO HttpServer: Starting HTTP Server
> 14/07/15 16:43:12 INFO SparkUI: Started SparkUI at 
> http://office-5-148.pa.gopivotal.com:4040
> 2014-07-15 16:43:12.210 java[39068:1903] Unable to load realm info from 
> SCDynamicStore
> 14/07/15 16:43:12 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 14/07/15 16:43:12 INFO SparkContext: Added JAR 
> examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at 
> http://10.64.5.148:53206/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar 
> with timestamp 1405467792813
> 14/07/15 16:43:12 INFO AppClient$ClientActor: Connecting to master 
> spark://henry-pivotal.local:7077...
> 14/07/15 16:43:12 INFO SparkContext: Starting job: reduce at 
> SparkTachyonPi.scala:43
> 14/07/15 16:43:12 INFO DAGScheduler: Got job 0 (reduce at 
> SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false)
> 14/07/15 16:43:12 INFO DAGScheduler: Final stage: Stage 0(reduce at 
> SparkTachyonPi.scala:43)
> 14/07/15 16:43:12 INFO DAGScheduler: Parents of final stage: List()
> 14/07/15 16:43:12 INFO DAGScheduler: Missing parents: List()
> 14/07/15 16:43:12 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map 
> at SparkTachyonPi.scala:39), which has no missing parents
> 14/07/15 16:43:13 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
> (MappedRDD[1] at map at SparkTachyonPi.scala:39)
> 14/07/15 16:43:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
> 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20140715164313-0000
> 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor added: 
> app-20140715164313-0000/0 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:43:13 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164313-0000/0 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:43:13 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164313-0000/0 is now RUNNING
> 14/07/15 16:43:15 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:53213/user/Executor#-423405256]
>  with ID 0
> 14/07/15 16:43:15 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on 
> executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:43:15 INFO TaskSetManager: Serialized task 0.0:0 as 1428 bytes in 
> 3 ms
> 14/07/15 16:43:15 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on 
> executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:43:15 INFO TaskSetManager: Serialized task 0.0:1 as 1428 bytes in 
> 1 ms
> 14/07/15 16:43:15 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:53218 with 294.9 MB RAM
> 14/07/15 16:43:16 INFO BlockManagerInfo: Added rdd_0_1 on tachyon on 
> office-5-148.pa.gopivotal.com:53218 (size: 977.2 KB)
> 14/07/15 16:43:16 INFO BlockManagerInfo: Added rdd_0_0 on tachyon on 
> office-5-148.pa.gopivotal.com:53218 (size: 977.2 KB)
> 14/07/15 16:43:16 INFO TaskSetManager: Finished TID 0 in 1307 ms on 
> office-5-148.pa.gopivotal.com (progress: 1/2)
> 14/07/15 16:43:16 INFO TaskSetManager: Finished TID 1 in 1300 ms on 
> office-5-148.pa.gopivotal.com (progress: 2/2)
> 14/07/15 16:43:16 INFO DAGScheduler: Completed ResultTask(0, 0)
> 14/07/15 16:43:16 INFO DAGScheduler: Completed ResultTask(0, 1)
> 14/07/15 16:43:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
> have all completed, from pool 
> 14/07/15 16:43:16 INFO DAGScheduler: Stage 0 (reduce at 
> SparkTachyonPi.scala:43) finished in 3.336 s
> 14/07/15 16:43:16 INFO SparkContext: Job finished: reduce at 
> SparkTachyonPi.scala:43, took 3.413498 s
> Pi is roughly 3.14254
> 14/07/15 16:43:16 INFO SparkUI: Stopped Spark web UI at 
> http://office-5-148.pa.gopivotal.com:4040
> 14/07/15 16:43:16 INFO DAGScheduler: Stopping DAGScheduler
> 14/07/15 16:43:16 INFO SparkDeploySchedulerBackend: Shutting down all 
> executors
> 14/07/15 16:43:16 INFO SparkDeploySchedulerBackend: Asking each executor to 
> shut down
> 14/07/15 16:43:17 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor 
> stopped!
> 14/07/15 16:43:17 INFO ConnectionManager: Selector thread was interrupted!
> 14/07/15 16:43:17 INFO ConnectionManager: ConnectionManager stopped
> 14/07/15 16:43:17 INFO MemoryStore: MemoryStore cleared
> 14/07/15 16:43:17 INFO BlockManager: BlockManager stopped
> 14/07/15 16:43:17 INFO BlockManagerMasterActor: Stopping BlockManagerMaster
> 14/07/15 16:43:17 INFO BlockManagerMaster: BlockManagerMaster stopped
> 14/07/15 16:43:17 INFO SparkContext: Successfully stopped SparkContext
> 14/07/15 16:43:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting 
> down remote daemon.
> 14/07/15 16:43:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote 
> daemon shut down; proceeding with flushing remote transports.
> Process finished with exit code 0
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> And here is the stack when Tachyon cannot be reached:
> 14/07/15 16:49:17 INFO Utils: Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 14/07/15 16:49:17 WARN Utils: Your hostname, henry-pivotal.local resolves to 
> a loopback address: 127.0.0.1; using 10.64.5.148 instead (on interface en5)
> 14/07/15 16:49:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 14/07/15 16:49:17 INFO SecurityManager: Changing view acls to: hsaputra
> 14/07/15 16:49:17 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users with view permissions: Set(hsaputra)
> 14/07/15 16:49:17 INFO Slf4jLogger: Slf4jLogger started
> 14/07/15 16:49:17 INFO Remoting: Starting remoting
> 14/07/15 16:49:17 INFO Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sp...@office-5-148.pa.gopivotal.com:54541]
> 14/07/15 16:49:17 INFO Remoting: Remoting now listens on addresses: 
> [akka.tcp://sp...@office-5-148.pa.gopivotal.com:54541]
> 14/07/15 16:49:17 INFO SparkEnv: Registering MapOutputTracker
> 14/07/15 16:49:17 INFO SparkEnv: Registering BlockManagerMaster
> 14/07/15 16:49:17 INFO DiskBlockManager: Created local directory at 
> /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-local-20140715164917-bf9e
> 14/07/15 16:49:17 INFO ConnectionManager: Bound socket to port 54542 with id 
> = ConnectionManagerId(office-5-148.pa.gopivotal.com,54542)
> 14/07/15 16:49:17 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
> 14/07/15 16:49:17 INFO BlockManagerMaster: Trying to register BlockManager
> 14/07/15 16:49:17 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54542 with 2.1 GB RAM
> 14/07/15 16:49:17 INFO BlockManagerMaster: Registered BlockManager
> 14/07/15 16:49:17 INFO HttpServer: Starting HTTP Server
> 14/07/15 16:49:17 INFO HttpBroadcast: Broadcast server started at 
> http://10.64.5.148:54543
> 14/07/15 16:49:17 INFO HttpFileServer: HTTP File server directory is 
> /var/folders/nv/nsr_3ysj0wgfq93fqp0rdt3w0000gp/T/spark-400178c7-8c6e-4e44-9610-926bd1f84877
> 14/07/15 16:49:17 INFO HttpServer: Starting HTTP Server
> 14/07/15 16:49:18 INFO SparkUI: Started SparkUI at 
> http://office-5-148.pa.gopivotal.com:4040
> 2014-07-15 16:49:18.144 java[39346:1903] Unable to load realm info from 
> SCDynamicStore
> 14/07/15 16:49:18 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 14/07/15 16:49:18 INFO SparkContext: Added JAR 
> examples/target/scala-2.10/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar at 
> http://10.64.5.148:54544/jars/spark-examples-1.1.0-SNAPSHOT-hadoop2.4.0.jar 
> with timestamp 1405468158551
> 14/07/15 16:49:18 INFO AppClient$ClientActor: Connecting to master 
> spark://henry-pivotal.local:7077...
> 14/07/15 16:49:18 INFO SparkContext: Starting job: reduce at 
> SparkTachyonPi.scala:43
> 14/07/15 16:49:18 INFO DAGScheduler: Got job 0 (reduce at 
> SparkTachyonPi.scala:43) with 2 output partitions (allowLocal=false)
> 14/07/15 16:49:18 INFO DAGScheduler: Final stage: Stage 0(reduce at 
> SparkTachyonPi.scala:43)
> 14/07/15 16:49:18 INFO DAGScheduler: Parents of final stage: List()
> 14/07/15 16:49:18 INFO DAGScheduler: Missing parents: List()
> 14/07/15 16:49:18 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map 
> at SparkTachyonPi.scala:39), which has no missing parents
> 14/07/15 16:49:18 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
> (MappedRDD[1] at map at SparkTachyonPi.scala:39)
> 14/07/15 16:49:18 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
> 14/07/15 16:49:18 INFO SparkDeploySchedulerBackend: Connected to Spark 
> cluster with app ID app-20140715164918-0001
> 14/07/15 16:49:18 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/0 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:18 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/0 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:18 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/0 is now RUNNING
> 14/07/15 16:49:20 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54548/user/Executor#-221675010]
>  with ID 0
> 14/07/15 16:49:20 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:49:20 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on 
> executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:20 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 
> 1 ms
> 14/07/15 16:49:20 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on 
> executor 0: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:20 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:20 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54553 with 294.9 MB RAM
> 14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Executor 0 disconnected, 
> so removing it
> 14/07/15 16:49:26 ERROR TaskSchedulerImpl: Lost executor 0 on 
> office-5-148.pa.gopivotal.com: remote Akka client disassociated
> 14/07/15 16:49:26 INFO TaskSetManager: Re-queueing tasks for 0 from TaskSet 
> 0.0
> 14/07/15 16:49:26 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/0 is now EXITED (Command exited with code 55)
> 14/07/15 16:49:26 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
> 14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Executor 
> app-20140715164918-0001/0 removed: Command exited with code 55
> 14/07/15 16:49:26 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
> 14/07/15 16:49:26 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/1 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:26 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/1 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:26 INFO DAGScheduler: Executor lost: 0 (epoch 0)
> 14/07/15 16:49:26 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/1 is now RUNNING
> 14/07/15 16:49:26 INFO BlockManagerMasterActor: Trying to remove executor 0 
> from BlockManagerMaster.
> 14/07/15 16:49:26 INFO BlockManagerMaster: Removed 0 successfully in 
> removeExecutor
> 14/07/15 16:49:28 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54573/user/Executor#1564333236]
>  with ID 1
> 14/07/15 16:49:28 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:49:28 INFO TaskSetManager: Starting task 0.0:0 as TID 2 on 
> executor 1: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:28 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:28 INFO TaskSetManager: Starting task 0.0:1 as TID 3 on 
> executor 1: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:28 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:28 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54578 with 294.9 MB RAM
> 14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Executor 1 disconnected, 
> so removing it
> 14/07/15 16:49:34 ERROR TaskSchedulerImpl: Lost executor 1 on 
> office-5-148.pa.gopivotal.com: remote Akka client disassociated
> 14/07/15 16:49:34 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
> 0.0
> 14/07/15 16:49:34 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
> 14/07/15 16:49:34 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
> 14/07/15 16:49:34 INFO DAGScheduler: Executor lost: 1 (epoch 1)
> 14/07/15 16:49:34 INFO BlockManagerMasterActor: Trying to remove executor 1 
> from BlockManagerMaster.
> 14/07/15 16:49:34 INFO BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> 14/07/15 16:49:34 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/1 is now EXITED (Command exited with code 55)
> 14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Executor 
> app-20140715164918-0001/1 removed: Command exited with code 55
> 14/07/15 16:49:34 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/2 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:34 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/2 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:34 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/2 is now RUNNING
> 14/07/15 16:49:37 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54599/user/Executor#-557403228]
>  with ID 2
> 14/07/15 16:49:37 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:49:37 INFO TaskSetManager: Starting task 0.0:1 as TID 4 on 
> executor 2: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:37 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 
> 1 ms
> 14/07/15 16:49:37 INFO TaskSetManager: Starting task 0.0:0 as TID 5 on 
> executor 2: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:37 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:37 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54604 with 294.9 MB RAM
> 14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Executor 2 disconnected, 
> so removing it
> 14/07/15 16:49:43 ERROR TaskSchedulerImpl: Lost executor 2 on 
> office-5-148.pa.gopivotal.com: remote Akka client disassociated
> 14/07/15 16:49:43 INFO TaskSetManager: Re-queueing tasks for 2 from TaskSet 
> 0.0
> 14/07/15 16:49:43 WARN TaskSetManager: Lost TID 5 (task 0.0:0)
> 14/07/15 16:49:43 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
> 14/07/15 16:49:43 INFO DAGScheduler: Executor lost: 2 (epoch 2)
> 14/07/15 16:49:43 INFO BlockManagerMasterActor: Trying to remove executor 2 
> from BlockManagerMaster.
> 14/07/15 16:49:43 INFO BlockManagerMaster: Removed 2 successfully in 
> removeExecutor
> 14/07/15 16:49:43 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/2 is now EXITED (Command exited with code 55)
> 14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Executor 
> app-20140715164918-0001/2 removed: Command exited with code 55
> 14/07/15 16:49:43 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/3 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:43 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/3 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:43 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/3 is now RUNNING
> 14/07/15 16:49:45 INFO SparkDeploySchedulerBackend: Registered executor: 
> Actor[akka.tcp://sparkexecu...@office-5-148.pa.gopivotal.com:54627/user/Executor#-1697612197]
>  with ID 3
> 14/07/15 16:49:45 INFO TaskSetManager: Re-computing pending task lists.
> 14/07/15 16:49:45 INFO TaskSetManager: Starting task 0.0:1 as TID 6 on 
> executor 3: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:45 INFO TaskSetManager: Serialized task 0.0:1 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:45 INFO TaskSetManager: Starting task 0.0:0 as TID 7 on 
> executor 3: office-5-148.pa.gopivotal.com (PROCESS_LOCAL)
> 14/07/15 16:49:45 INFO TaskSetManager: Serialized task 0.0:0 as 1429 bytes in 
> 0 ms
> 14/07/15 16:49:45 INFO BlockManagerMasterActor: Registering block manager 
> office-5-148.pa.gopivotal.com:54634 with 294.9 MB RAM
> 14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: Executor 3 disconnected, 
> so removing it
> 14/07/15 16:49:51 ERROR TaskSchedulerImpl: Lost executor 3 on 
> office-5-148.pa.gopivotal.com: remote Akka client disassociated
> 14/07/15 16:49:51 INFO TaskSetManager: Re-queueing tasks for 3 from TaskSet 
> 0.0
> 14/07/15 16:49:51 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
> 14/07/15 16:49:51 ERROR TaskSetManager: Task 0.0:0 failed 4 times; aborting 
> job
> 14/07/15 16:49:51 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
> have all completed, from pool 
> 14/07/15 16:49:51 INFO TaskSchedulerImpl: Cancelling stage 0
> 14/07/15 16:49:51 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/3 is now EXITED (Command exited with code 55)
> 14/07/15 16:49:51 INFO DAGScheduler: Failed to run reduce at 
> SparkTachyonPi.scala:43
> Exception in thread "main" 14/07/15 16:49:51 INFO 
> SparkDeploySchedulerBackend: Executor app-20140715164918-0001/3 removed: 
> Command exited with code 55
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 
> failed 4 times, most recent failure: TID 7 on host 
> office-5-148.pa.gopivotal.com failed for unknown reason
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1046)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1030)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1028)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:632)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:632)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:632)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1231)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/07/15 16:49:51 INFO AppClient$ClientActor: Executor added: 
> app-20140715164918-0001/4 on 
> worker-20140715164009-office-5-148.pa.gopivotal.com-52519 
> (office-5-148.pa.gopivotal.com:52519) with 8 cores
> 14/07/15 16:49:51 INFO SparkDeploySchedulerBackend: Granted executor ID 
> app-20140715164918-0001/4 on hostPort office-5-148.pa.gopivotal.com:52519 
> with 8 cores, 512.0 MB RAM
> 14/07/15 16:49:51 INFO AppClient$ClientActor: Executor updated: 
> app-20140715164918-0001/4 is now RUNNING
> 14/07/15 16:49:51 INFO DAGScheduler: Executor lost: 3 (epoch 3)
> 14/07/15 16:49:51 INFO BlockManagerMasterActor: Trying to remove executor 3 
> from BlockManagerMaster.
> 14/07/15 16:49:51 INFO BlockManagerMaster: Removed 3 successfully in 
> removeExecutor
> Process finished with exit code 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to