Yeah, this wasn't detected in our performance tests. We even have a test in PySpark that I would have though might catch this (it just schedules a bunch of really small tasks, similar to the regression case).
https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51 Anyways, Josh is trying to repro the regression to see if we can figure out what is going on. If we find something for sure we should add a test. On Mon, Sep 1, 2014 at 10:04 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: > Nope, actually, they didn't find that (they found some other things that were > fixed, as well as some improvements). Feel free to send a PR, but it would be > good to profile the issue first to understand what slowed down. (For example > is the map phase taking longer or is it the reduce phase, is there some > difference in lengths of specific tasks, etc). > > Matei > > On September 1, 2014 at 10:03:20 PM, Nicholas Chammas > (nicholas.cham...@gmail.com) wrote: > > Oh, that's sweet. So, a related question then. > > Did those tests pick up the performance issue reported in SPARK-3333? Does it > make sense to add a new test to cover that case? > > > On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > Hi Nicholas, > > At Databricks we already run https://github.com/databricks/spark-perf for > each release, which is a more comprehensive performance test suite. > > Matei > > On September 1, 2014 at 8:22:05 PM, Nicholas Chammas > (nicholas.cham...@gmail.com) wrote: > > What do people think of running the Big Data Benchmark > <https://amplab.cs.berkeley.edu/benchmark/> (repo > <https://github.com/amplab/benchmark>) as part of preparing every new > release of Spark? > > We'd run it just for Spark and effectively use it as another type of test > to track any performance progress or regressions from release to release. > > Would doing such a thing be valuable? Do we already have a way of > benchmarking Spark performance that we use regularly? > > Nick > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org