Re: Spark SQL performance: version 1.6 vs version 1.5

Herman van Hövell tot Westerflier Fri, 12 Feb 2016 07:52:11 -0800

Hi Tien-Dung,

1.6 plans single distinct aggregates like multiple distinct aggregates;
this inherently causes some overhead but is more stable in case of high
cardinalities. You can revert to the old behavior by setting the
spark.sql.specializeSingleDistinctAggPlanning option to false. See also:
https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala#L452-L462


HTH

Kind regards,

Herman van Hövell


2016-02-12 16:23 GMT+01:00 Le Tien Dung <tiendzung...@gmail.com>:

> Hi folks,
>
> I have compared the performance of Spark SQL version 1.6.0 and version
> 1.5.2. In a simple case, Spark 1.6.0 is quite faster than Spark 1.5.2.
> However in a more complex query - in our case it is an aggregation query
> with grouping sets, Spark SQL version 1.6.0 is very much slower than Spark
> SQL version 1.5. Could any of you kindly let us know a workaround for this
> performance regression ?
>
> Here is our test scenario:
>
> case class Toto(
>                  a: String = f"${(math.random*1e6).toLong}%06.0f",
>                  b: String = f"${(math.random*1e6).toLong}%06.0f",
>                  c: String = f"${(math.random*1e6).toLong}%06.0f",
>                  n: Int = (math.random*1e3).toInt,
>                  m: Double = (math.random*1e3))
>
> val data = sc.parallelize(1 to 1e6.toInt).map(i => Toto())
> val df: org.apache.spark.sql.DataFrame = sqlContext.createDataFrame( data )
>
> df.registerTempTable( "toto" )
> val sqlSelect = "SELECT a, b, COUNT(1) AS k1, COUNT(DISTINCT n) AS k2,
> SUM(m) AS k3"
> val sqlGroupBy = "FROM toto GROUP BY a, b GROUPING SETS ((a,b),(a),(b))"
> val sqlText = s"$sqlSelect $sqlGroupBy"
>
> val rs1 = sqlContext.sql( sqlText )
> rs1.saveAsParquetFile( "rs1" )
>
> The query is executed from a spark-shell in local mode with
> --driver-memory=1G. Screenshots from Spark UI are accessible at
> http://i.stack.imgur.com/VujQY.png (Spark 1.5.2) and
> http://i.stack.imgur.com/Hlg95.png (Spark 1.6.0). The DAG on Spark 1.6.0
> can be viewed at http://i.stack.imgur.com/u3HrG.png.
>
> Many thanks and looking forward to hearing from you,
> Tien-Dung Le
>
>

Re: Spark SQL performance: version 1.6 vs version 1.5

Reply via email to