here is the slave log: E0702 10:32:07.599364 14915 slave.cpp:2686] Failed to unmonitor container for executor 20140616-104524-1694607552-5050-26919-1 of framework 20140702-102939-1694607552-5050-14846-0000: Not monitored
E0702 11:35:08.869998 17840 slave.cpp:2310] Container 'af557235-2d5f-4062-aaf3-a747cb3cd0d1' for executor '20140616-104524-1694607552-5050-26919-1' of framework '20140702-113428-1694607552-5050-17766-0000' failed to start: Failed to fetch URIs for container 'af557235-2d5f-4062-aaf3-a747cb3cd0d1': exit status 32512 2014-07-01 16:24 GMT+08:00 qingyang li <[email protected]>: > i am using mesos0.19 and spark0.9.0 , the mesos cluster is started, when > I using spark-shell to submit one job, the tasks always lost. here is the > log: > ---------- > 14/07/01 16:24:27 INFO DAGScheduler: Host gained which was in lost list > earlier: bigdata005 > 14/07/01 16:24:27 INFO TaskSetManager: Starting task 0.0:1 as TID 4042 on > executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL) > 14/07/01 16:24:27 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes > in 0 ms > 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for > 20140616-104524-1694607552-5050-26919-1 from TaskSet 0.0 > 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4041 (task 0.0:0) > 14/07/01 16:24:28 INFO DAGScheduler: Executor lost: > 20140616-104524-1694607552-5050-26919-1 (epoch 3427) > 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor > 20140616-104524-1694607552-5050-26919-1 from BlockManagerMaster. > 14/07/01 16:24:28 INFO BlockManagerMaster: Removed > 20140616-104524-1694607552-5050-26919-1 successfully in removeExecutor > 14/07/01 16:24:28 INFO TaskSetManager: Re-queueing tasks for > 20140616-143932-1694607552-5050-4080-2 from TaskSet 0.0 > 14/07/01 16:24:28 WARN TaskSetManager: Lost TID 4042 (task 0.0:1) > 14/07/01 16:24:28 INFO DAGScheduler: Executor lost: > 20140616-143932-1694607552-5050-4080-2 (epoch 3428) > 14/07/01 16:24:28 INFO BlockManagerMasterActor: Trying to remove executor > 20140616-143932-1694607552-5050-4080-2 from BlockManagerMaster. > 14/07/01 16:24:28 INFO BlockManagerMaster: Removed > 20140616-143932-1694607552-5050-4080-2 successfully in removeExecutor > 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list > earlier: bigdata005 > 14/07/01 16:24:28 INFO DAGScheduler: Host gained which was in lost list > earlier: bigdata001 > 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:1 as TID 4043 on > executor 20140616-143932-1694607552-5050-4080-2: bigdata005 (PROCESS_LOCAL) > 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:1 as 1570 bytes > in 0 ms > 14/07/01 16:24:28 INFO TaskSetManager: Starting task 0.0:0 as TID 4044 on > executor 20140616-104524-1694607552-5050-26919-1: bigdata001 (PROCESS_LOCAL) > 14/07/01 16:24:28 INFO TaskSetManager: Serialized task 0.0:0 as 1570 bytes > in 0 ms > > > it seems other guy has also encountered such problem, > > http://mail-archives.apache.org/mod_mbox/incubator-mesos-dev/201305.mbox/%[email protected]%3E >
