Re: A couple questions about shared variables

2014-09-21 Thread Matei Zaharia
Hmm, good point, this seems to have been broken by refactorings of the scheduler, but it worked in the past. Basically the solution is simple -- in a result stage, we should not apply the update for each task ID more than once -- the same way we don't call job.listener.taskSucceeded more than

BlockManager issues

2014-09-21 Thread Nishkam Ravi
Recently upgraded to 1.1.0. Saw a bunch of fetch failures for one of the workloads. Tried tracing the problem through change set analysis. Looks like the offending commit is 4fde28c from Aug 4th for PR1707. Please see SPARK-3633 for more details. Thanks, Nishkam

Re: BlockManager issues

2014-09-21 Thread Reynold Xin
It seems like you just need to raise the ulimit? On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi nr...@cloudera.com wrote: Recently upgraded to 1.1.0. Saw a bunch of fetch failures for one of the workloads. Tried tracing the problem through change set analysis. Looks like the offending commit

Re: Gaussian Mixture Model clustering

2014-09-21 Thread Meethu Mathew
Hi Evan, Sorry that I forgot to mention about it. I set the value of K as 10 for the benchmark study. On Friday 19 September 2014 11:24 PM, Evan R. Sparks wrote: Hey Meethu - what are you setting K to in the benchmarks you show? This can greatly affect the runtime. On Thu, Sep 18, 2014 at

Re: BlockManager issues

2014-09-21 Thread Patrick Wendell
Hey the numbers you mentioned don't quite line up - did you mean PR 2711? On Sun, Sep 21, 2014 at 8:45 PM, Reynold Xin r...@databricks.com wrote: It seems like you just need to raise the ulimit? On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi nr...@cloudera.com wrote: Recently upgraded to

Re: BlockManager issues

2014-09-21 Thread Patrick Wendell
Ah I see it was SPARK-2711 (and PR1707). In that case, it's possible that you are just having more spilling as a result of the patch and so the filesystem is opening more files. I would try increasing the ulimit. How much memory do your executors have? - Patrick On Sun, Sep 21, 2014 at 10:29

Re: BlockManager issues

2014-09-21 Thread Nishkam Ravi
Thanks for the quick follow up Reynold and Patrick. Tried a run with significantly higher ulimit, doesn't seem to help. The executors have 35GB each. Btw, with a recent version of the branch, the error message is fetch failures as opposed to too many open files. Not sure if they are related.