Yeah, this wasn't detected in our performance tests. We even have a
test in PySpark that I would have though might catch this (it just
schedules a bunch of really small tasks, similar to the regression
case).

https://github.com/databricks/spark-perf/blob/master/pyspark-tests/tests.py#L51

Anyways, Josh is trying to repro the regression to see if we can
figure out what is going on. If we find something for sure we should
add a test.

On Mon, Sep 1, 2014 at 10:04 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
> Nope, actually, they didn't find that (they found some other things that were 
> fixed, as well as some improvements). Feel free to send a PR, but it would be 
> good to profile the issue first to understand what slowed down. (For example 
> is the map phase taking longer or is it the reduce phase, is there some 
> difference in lengths of specific tasks, etc).
>
> Matei
>
> On September 1, 2014 at 10:03:20 PM, Nicholas Chammas 
> (nicholas.cham...@gmail.com) wrote:
>
> Oh, that's sweet. So, a related question then.
>
> Did those tests pick up the performance issue reported in SPARK-3333? Does it 
> make sense to add a new test to cover that case?
>
>
> On Tue, Sep 2, 2014 at 12:29 AM, Matei Zaharia <matei.zaha...@gmail.com> 
> wrote:
> Hi Nicholas,
>
> At Databricks we already run https://github.com/databricks/spark-perf for 
> each release, which is a more comprehensive performance test suite.
>
> Matei
>
> On September 1, 2014 at 8:22:05 PM, Nicholas Chammas 
> (nicholas.cham...@gmail.com) wrote:
>
> What do people think of running the Big Data Benchmark
> <https://amplab.cs.berkeley.edu/benchmark/> (repo
> <https://github.com/amplab/benchmark>) as part of preparing every new
> release of Spark?
>
> We'd run it just for Spark and effectively use it as another type of test
> to track any performance progress or regressions from release to release.
>
> Would doing such a thing be valuable? Do we already have a way of
> benchmarking Spark performance that we use regularly?
>
> Nick
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to