Ah thanks! I'll probably set it to a minute or 30 seconds. BTW I did discover I did something stupid (again) -- I added spark.stop() to the example, and recompiled it using scalac -cp ... -d mypi.jar mypi.scala, but it didn't actually update the jar. I have to delete the jar first to get it to recompile. Once I did that, it did release resources.
It's a bit rough trying to learn two frameworks and one language (two if you count java) at the same time :-) Which version of Mesos should I use with the latest version of Spark? I noticed the head of svn generated mesos-0.9.0.jar, which I don't think Spark knows how to find. I can update the Spark run script if need be, but if it isn't an approved version combo then I won't bother. On Fri, Apr 20, 2012 at 12:48 PM, Matei Zaharia <[email protected]> wrote: > Ah yes, this is due to a feature called "framework failover" in that version > of Mesos that has an overly large timeout by default. Basically the idea is > that if a framework's master disconnects, we give it some time to reconnect > before killing its executors and tasks, but this time is by default 1 day. > You can fix it by adding the parameter --failover_timeout=1 when running > mesos-master. If you're running through the deploy scripts, add > failover_timeout=1 to your mesos.conf. > > I'll update the Spark wiki to mention this because it's come up a bunch. It > will not be an issue in Mesos 0.9. > > Matei > > On Apr 20, 2012, at 10:39 AM, Scott Smith wrote: > >> I'm running Spark git head / Mesos 1205738. My cluster is small -- a >> single slave with 2 CPUs and 1.2GB of available RAM. >> >> I can run SparkPi once, given: >> ./run spark.examples.SparkPi master@... >> >> but I can't run it twice. It seems that each invocation of SparkPi >> creates a new framework entry in the webui: >> >> 201204200627-0-0022 ubuntu SparkPi 0 0 800.0 MB >> 0.68 2012-04-20 17:24:47 >> >> even after waiting for a couple minutes, the memory is still reserved. >> >> I'm not sure what is supposed to release the resource -- the program >> has exited, so the framework shouldn't exist anymore. I added >> 'spark.stop()' to the end of the program but that doesn't help. The >> only way I've found to clean up the slave is to kill and restart it. >> Doing this, however, still leaves stale empty framework entries in the >> master: >> >> 201204200627-0-0018 ubuntu SparkPi 0 0 0.0 MB 0.00 >> 2012-04-20 17:09:28 >> 201204200627-0-0019 ubuntu SparkPi 0 0 0.0 MB 0.00 >> 2012-04-20 17:17:25 >> 201204200627-0-0016 ubuntu SparkPi 0 0 0.0 MB 0.00 >> 2012-04-20 16:50:35 >> 201204200627-0-0017 ubuntu SparkPi 0 0 0.0 MB 0.00 >> 2012-04-20 16:51:19 >> ..... >> >> I'm also not sure if instead the correct behavior is that subsequent >> invocations of SparkPi should reuse the existing framework -- if so, >> how do I make that happen? >> >> Thanks! >> -- >> Scott > -- Scott
