Hi Tien-Dung, 1.6 plans single distinct aggregates like multiple distinct aggregates; this inherently causes some overhead but is more stable in case of high cardinalities. You can revert to the old behavior by setting the spark.sql.specializeSingleDistinctAggPlanning option to false. See also: https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala#L452-L462
HTH Kind regards, Herman van Hövell 2016-02-12 16:23 GMT+01:00 Le Tien Dung <tiendzung...@gmail.com>: > Hi folks, > > I have compared the performance of Spark SQL version 1.6.0 and version > 1.5.2. In a simple case, Spark 1.6.0 is quite faster than Spark 1.5.2. > However in a more complex query - in our case it is an aggregation query > with grouping sets, Spark SQL version 1.6.0 is very much slower than Spark > SQL version 1.5. Could any of you kindly let us know a workaround for this > performance regression ? > > Here is our test scenario: > > case class Toto( > a: String = f"${(math.random*1e6).toLong}%06.0f", > b: String = f"${(math.random*1e6).toLong}%06.0f", > c: String = f"${(math.random*1e6).toLong}%06.0f", > n: Int = (math.random*1e3).toInt, > m: Double = (math.random*1e3)) > > val data = sc.parallelize(1 to 1e6.toInt).map(i => Toto()) > val df: org.apache.spark.sql.DataFrame = sqlContext.createDataFrame( data ) > > df.registerTempTable( "toto" ) > val sqlSelect = "SELECT a, b, COUNT(1) AS k1, COUNT(DISTINCT n) AS k2, > SUM(m) AS k3" > val sqlGroupBy = "FROM toto GROUP BY a, b GROUPING SETS ((a,b),(a),(b))" > val sqlText = s"$sqlSelect $sqlGroupBy" > > val rs1 = sqlContext.sql( sqlText ) > rs1.saveAsParquetFile( "rs1" ) > > The query is executed from a spark-shell in local mode with > --driver-memory=1G. Screenshots from Spark UI are accessible at > http://i.stack.imgur.com/VujQY.png (Spark 1.5.2) and > http://i.stack.imgur.com/Hlg95.png (Spark 1.6.0). The DAG on Spark 1.6.0 > can be viewed at http://i.stack.imgur.com/u3HrG.png. > > Many thanks and looking forward to hearing from you, > Tien-Dung Le > >