Ah yes, this is due to a feature called "framework failover" in that version of 
Mesos that has an overly large timeout by default. Basically the idea is that 
if a framework's master disconnects, we give it some time to reconnect before 
killing its executors and tasks, but this time is by default 1 day. You can fix 
it by adding the parameter --failover_timeout=1 when running mesos-master. If 
you're running through the deploy scripts, add failover_timeout=1 to your 
mesos.conf.

I'll update the Spark wiki to mention this because it's come up a bunch. It 
will not be an issue in Mesos 0.9.

Matei

On Apr 20, 2012, at 10:39 AM, Scott Smith wrote:

> I'm running Spark git head / Mesos 1205738.  My cluster is small -- a
> single slave with 2 CPUs and 1.2GB of available RAM.
> 
> I can run SparkPi once, given:
> ./run spark.examples.SparkPi master@...
> 
> but I can't run it twice.  It seems that each invocation of SparkPi
> creates a new framework entry in the webui:
> 
> 201204200627-0-0022   ubuntu  SparkPi         0       0       800.0 MB        
> 0.68    2012-04-20 17:24:47
> 
> even after waiting for a couple minutes, the memory is still reserved.
> 
> I'm not sure what is supposed to release the resource -- the program
> has exited, so the framework shouldn't exist anymore.  I added
> 'spark.stop()' to the end of the program but that doesn't help.  The
> only way I've found to clean up the slave is to kill and restart it.
> Doing this, however, still leaves stale empty framework entries in the
> master:
> 
> 201204200627-0-0018   ubuntu  SparkPi         0       0       0.0 MB  0.00    
> 2012-04-20 17:09:28
> 201204200627-0-0019   ubuntu  SparkPi         0       0       0.0 MB  0.00    
> 2012-04-20 17:17:25
> 201204200627-0-0016   ubuntu  SparkPi         0       0       0.0 MB  0.00    
> 2012-04-20 16:50:35
> 201204200627-0-0017   ubuntu  SparkPi         0       0       0.0 MB  0.00    
> 2012-04-20 16:51:19
> .....
> 
> I'm also not sure if instead the correct behavior is that subsequent
> invocations of SparkPi should reuse the existing framework -- if so,
> how do I make that happen?
> 
> Thanks!
> -- 
>         Scott

Reply via email to