Ah thanks!  I'll probably set it to a minute or 30 seconds.

BTW I did discover I did something stupid (again) -- I added
spark.stop() to the example, and recompiled it using scalac -cp ... -d
mypi.jar mypi.scala, but it didn't actually update the jar.  I have to
delete the jar first to get it to recompile.  Once I did that, it did
release resources.

It's a bit rough trying to learn two frameworks and one language (two
if you count java) at the same time :-)

Which version of Mesos should I use with the latest version of Spark?
I noticed the head of svn generated mesos-0.9.0.jar, which I don't
think Spark knows how to find.  I can update the Spark run script if
need be, but if it isn't an approved version combo then I won't
bother.

On Fri, Apr 20, 2012 at 12:48 PM, Matei Zaharia <[email protected]> wrote:
> Ah yes, this is due to a feature called "framework failover" in that version 
> of Mesos that has an overly large timeout by default. Basically the idea is 
> that if a framework's master disconnects, we give it some time to reconnect 
> before killing its executors and tasks, but this time is by default 1 day. 
> You can fix it by adding the parameter --failover_timeout=1 when running 
> mesos-master. If you're running through the deploy scripts, add 
> failover_timeout=1 to your mesos.conf.
>
> I'll update the Spark wiki to mention this because it's come up a bunch. It 
> will not be an issue in Mesos 0.9.
>
> Matei
>
> On Apr 20, 2012, at 10:39 AM, Scott Smith wrote:
>
>> I'm running Spark git head / Mesos 1205738.  My cluster is small -- a
>> single slave with 2 CPUs and 1.2GB of available RAM.
>>
>> I can run SparkPi once, given:
>> ./run spark.examples.SparkPi master@...
>>
>> but I can't run it twice.  It seems that each invocation of SparkPi
>> creates a new framework entry in the webui:
>>
>> 201204200627-0-0022   ubuntu  SparkPi         0       0       800.0 MB       
>>  0.68    2012-04-20 17:24:47
>>
>> even after waiting for a couple minutes, the memory is still reserved.
>>
>> I'm not sure what is supposed to release the resource -- the program
>> has exited, so the framework shouldn't exist anymore.  I added
>> 'spark.stop()' to the end of the program but that doesn't help.  The
>> only way I've found to clean up the slave is to kill and restart it.
>> Doing this, however, still leaves stale empty framework entries in the
>> master:
>>
>> 201204200627-0-0018   ubuntu  SparkPi         0       0       0.0 MB  0.00   
>>  2012-04-20 17:09:28
>> 201204200627-0-0019   ubuntu  SparkPi         0       0       0.0 MB  0.00   
>>  2012-04-20 17:17:25
>> 201204200627-0-0016   ubuntu  SparkPi         0       0       0.0 MB  0.00   
>>  2012-04-20 16:50:35
>> 201204200627-0-0017   ubuntu  SparkPi         0       0       0.0 MB  0.00   
>>  2012-04-20 16:51:19
>> .....
>>
>> I'm also not sure if instead the correct behavior is that subsequent
>> invocations of SparkPi should reuse the existing framework -- if so,
>> how do I make that happen?
>>
>> Thanks!
>> --
>>         Scott
>



-- 
        Scott

Reply via email to