Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too bad that this happens in fine-grained mode -- would be really good to fix. I'll see if we can get the workaround in https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally have you tried that?
Matei On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) wrote: Hi Matei, We have an analytics team that uses the cluster on a daily basis. They use two types of 'run modes': 1) For running actual queries, they set the spark.executor.memory to something between 4 and 8GB of RAM/worker. 2) A shell that takes a minimal amount of memory on workers (128MB) for prototyping out a larger query. This allows them to not take up RAM on the cluster when they do not really need it. We see the deadlocks when there are a few shells in either case. From the usage patterns we have, coarse-grained mode would be a challenge as we have to constantly remind people to kill their shells as soon as their queries finish. Am I correct in viewing Mesos in coarse-grained mode as being similar to Spark Standalone's cpu allocation behavior? On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: Hey Gary, just as a workaround, note that you can use Mesos in coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold onto CPUs for the duration of the job. Matei On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) wrote: I just wanted to bring up a significant Mesos/Spark issue that makes the combo difficult to use for teams larger than 4-5 people. It's covered in https://issues.apache.org/jira/browse/MESOS-1688. My understanding is that Spark's use of executors in fine-grained mode is a very different behavior than many of the other common frameworks for Mesos.