Hmm, good point, this seems to have been broken by refactorings of the
scheduler, but it worked in the past. Basically the solution is simple -- in a
result stage, we should not apply the update for each task ID more than once --
the same way we don't call job.listener.taskSucceeded more than
Recently upgraded to 1.1.0. Saw a bunch of fetch failures for one of the
workloads. Tried tracing the problem through change set analysis. Looks
like the offending commit is 4fde28c from Aug 4th for PR1707. Please see
SPARK-3633 for more details.
Thanks,
Nishkam
It seems like you just need to raise the ulimit?
On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi nr...@cloudera.com wrote:
Recently upgraded to 1.1.0. Saw a bunch of fetch failures for one of the
workloads. Tried tracing the problem through change set analysis. Looks
like the offending commit
Hi Evan,
Sorry that I forgot to mention about it. I set the value of K as 10 for
the benchmark study.
On Friday 19 September 2014 11:24 PM, Evan R. Sparks wrote:
Hey Meethu - what are you setting K to in the benchmarks you show?
This can greatly affect the runtime.
On Thu, Sep 18, 2014 at
Hey the numbers you mentioned don't quite line up - did you mean PR 2711?
On Sun, Sep 21, 2014 at 8:45 PM, Reynold Xin r...@databricks.com wrote:
It seems like you just need to raise the ulimit?
On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi nr...@cloudera.com wrote:
Recently upgraded to
Ah I see it was SPARK-2711 (and PR1707). In that case, it's possible
that you are just having more spilling as a result of the patch and so
the filesystem is opening more files. I would try increasing the
ulimit.
How much memory do your executors have?
- Patrick
On Sun, Sep 21, 2014 at 10:29
Thanks for the quick follow up Reynold and Patrick. Tried a run with
significantly higher ulimit, doesn't seem to help. The executors have 35GB
each. Btw, with a recent version of the branch, the error message is fetch
failures as opposed to too many open files. Not sure if they are
related.