Re: Mesos/Spark Deadlock

2014-08-25 Thread Gary Malouf
We have not tried the work-around because there are other bugs in there that affected our set-up, though it seems it would help. On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote: +1 to have the work around in. I'll be investigating from the Mesos side too. Tim On

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
This is kind of weird then, seems perhaps unrelated to this issue (or at least to the way I understood it). Is the problem maybe that Mesos saw 0 MB being freed and didn't re-offer the machine *even though there was more than 32 MB free overall*? Matei On August 25, 2014 at 12:59:59 PM, Cody

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
Anyway it would be good if someone from the Mesos side investigates this and proposes a solution. The 32 MB per task hack isn't completely foolproof either (e.g. people might allocate all the RAM to their executor and thus stop being able to launch tasks), so maybe we wait on a Mesos fix for

Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
Hi Matei, I'm going to investigate from both Mesos and Spark side will hopefully have a good long term solution. In the mean time having a work around to start with is going to unblock folks. Tim On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Anyway it would be

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
My problem is that I'm not sure this workaround would solve things, given the issue described here (where there was a lot of memory free but it didn't get re-offered). If you think it does, it would be good to explain why it behaves like that. Matei On August 25, 2014 at 2:28:18 PM, Timothy

Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
I don't think it solves Cody's problem which still need more investigating, but I believe it does solve the problem you described earlier. I just confirmed with Mesos folks that we no longer need the minimum memory requirement so we'll be dropping that soon and the workaround might not be needed

Re: Mesos/Spark Deadlock

2014-08-24 Thread Matei Zaharia
Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too bad that this happens in fine-grained mode -- would be really good to fix. I'll see if we can get the workaround in https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally have you tried that? Matei On

Re: Mesos/Spark Deadlock

2014-08-24 Thread Timothy Chen
+1 to have the work around in. I'll be investigating from the Mesos side too. Tim On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too bad that this happens in fine-grained mode -- would be

Mesos/Spark Deadlock

2014-08-23 Thread Gary Malouf
I just wanted to bring up a significant Mesos/Spark issue that makes the combo difficult to use for teams larger than 4-5 people. It's covered in https://issues.apache.org/jira/browse/MESOS-1688. My understanding is that Spark's use of executors in fine-grained mode is a very different behavior

Re: Mesos/Spark Deadlock

2014-08-23 Thread Gary Malouf
Hi Matei, We have an analytics team that uses the cluster on a daily basis. They use two types of 'run modes': 1) For running actual queries, they set the spark.executor.memory to something between 4 and 8GB of RAM/worker. 2) A shell that takes a minimal amount of memory on workers (128MB) for