Re: Mesos/Spark Deadlock

Matei Zaharia Mon, 25 Aug 2014 15:08:03 -0700

My problem is that I'm not sure this workaround would solve things, given the 
issue described here (where there was a lot of memory free but it didn't get 
re-offered). If you think it does, it would be good to explain why it behaves 
like that.


Matei

On August 25, 2014 at 2:28:18 PM, Timothy Chen (tnac...@gmail.com) wrote:

Hi Matei, 

I'm going to investigate from both Mesos and Spark side will hopefully 
have a good long term solution. In the mean time having a work around 
to start with is going to unblock folks. 

Tim 

On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: 
> Anyway it would be good if someone from the Mesos side investigates this and 
> proposes a solution. The 32 MB per task hack isn't completely foolproof 
> either (e.g. people might allocate all the RAM to their executor and thus 
> stop being able to launch tasks), so maybe we wait on a Mesos fix for this 
> one. 
> 
> Matei 
> 
> On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com) 
> wrote: 
> 
> This is kind of weird then, seems perhaps unrelated to this issue (or at 
> least to the way I understood it). Is the problem maybe that Mesos saw 0 MB 
> being freed and didn't re-offer the machine *even though there was more than 
> 32 MB free overall*? 
> 
> Matei 
> 
> On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org) 
> wrote: 
> 
> I definitely saw a case where 
> 
> a. the only job running was a 256m shell 
> b. I started a 2g job 
> c. a little while later the same user as in a started another 256m shell 
> 
> My job immediately stopped making progress. Once user a killed his shells, 
> it started again. 
> 
> This is on nodes with ~15G of memory, on which we have successfully run 8G 
> jobs. 
> 
> 
> On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia <matei.zaha...@gmail.com> 
> wrote: 
>> 
>> BTW it seems to me that even without that patch, you should be getting 
>> tasks launched as long as you leave at least 32 MB of memory free on each 
>> machine (that is, the sum of the executor memory sizes is not exactly the 
>> same as the total size of the machine). Then Mesos will be able to re-offer 
>> that machine whenever CPUs free up. 
>> 
>> Matei 
>> 
>> On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com) 
>> wrote: 
>> 
>> We have not tried the work-around because there are other bugs in there 
>> that affected our set-up, though it seems it would help. 
>> 
>> 
>> On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen <tnac...@gmail.com> wrote: 
>> 
>> > +1 to have the work around in. 
>> > 
>> > I'll be investigating from the Mesos side too. 
>> > 
>> > Tim 
>> > 
>> > On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia <matei.zaha...@gmail.com> 
>> > wrote: 
>> > > Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's 
>> > > too 
>> > bad that this happens in fine-grained mode -- would be really good to 
>> > fix. 
>> > I'll see if we can get the workaround in 
>> > https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally 
>> > have you tried that? 
>> > > 
>> > > Matei 
>> > > 
>> > > On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) 
>> > wrote: 
>> > > 
>> > > Hi Matei, 
>> > > 
>> > > We have an analytics team that uses the cluster on a daily basis. They 
>> > use two types of 'run modes': 
>> > > 
>> > > 1) For running actual queries, they set the spark.executor.memory to 
>> > something between 4 and 8GB of RAM/worker. 
>> > > 
>> > > 2) A shell that takes a minimal amount of memory on workers (128MB) 
>> > > for 
>> > prototyping out a larger query. This allows them to not take up RAM on 
>> > the 
>> > cluster when they do not really need it. 
>> > > 
>> > > We see the deadlocks when there are a few shells in either case. From 
>> > the usage patterns we have, coarse-grained mode would be a challenge as 
>> > we 
>> > have to constantly remind people to kill their shells as soon as their 
>> > queries finish. 
>> > > 
>> > > Am I correct in viewing Mesos in coarse-grained mode as being similar 
>> > > to 
>> > Spark Standalone's cpu allocation behavior? 
>> > > 
>> > > 
>> > > 
>> > > 
>> > > On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia 
>> > > <matei.zaha...@gmail.com> 
>> > wrote: 
>> > > Hey Gary, just as a workaround, note that you can use Mesos in 
>> > coarse-grained mode by setting spark.mesos.coarse=true. Then it will 
>> > hold 
>> > onto CPUs for the duration of the job. 
>> > > 
>> > > Matei 
>> > > 
>> > > On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) 
>> > wrote: 
>> > > 
>> > > I just wanted to bring up a significant Mesos/Spark issue that makes 
>> > > the 
>> > > combo difficult to use for teams larger than 4-5 people. It's covered 
>> > > in 
>> > > https://issues.apache.org/jira/browse/MESOS-1688. My understanding is 
>> > that 
>> > > Spark's use of executors in fine-grained mode is a very different 
>> > behavior 
>> > > than many of the other common frameworks for Mesos. 
>> > > 
>> > 
> 
>

Re: Mesos/Spark Deadlock

Reply via email to