[ https://issues.apache.org/jira/browse/SPARK-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973786#comment-14973786 ]
Wenchen Fan commented on SPARK-11253: ------------------------------------- There is one more issue about the SQL metric: We use -1 as initial size of the "dataSize" and "spillSize" metrics, to work around https://issues.apache.org/jira/browse/SPARK-11013. For accumulators in physical plan, we only have a final aggregated value(currently we only have `sum`), which is different from SQLListener(has values for every task). So the -1 will affect the final aggregated values. Code to confirm it: {code} val metrics = ArrayBuffer.empty[Long] val listener = new QueryExecutionListener { override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = {} override def onSuccess(funcName: String, qe: QueryExecution, duration: Long): Unit = { metrics += qe.executedPlan.longMetric("dataSize").value.value val bottomAgg = qe.executedPlan.children(0).children(0) metrics += bottomAgg.longMetric("dataSize").value.value } } sqlContext.listenerManager.register(listener) withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2") { val df = sqlContext.sparkContext.makeRDD(Seq(1 -> "a", 2 -> "b"), 2).toDF("i", "j").groupBy("i").count() df.collect() assert(metrics(0) - metrics(1) == 2) } {code} But the impaction is quite small, we can fix it later. > reset all accumulators in physical operators before execute an action > --------------------------------------------------------------------- > > Key: SPARK-11253 > URL: https://issues.apache.org/jira/browse/SPARK-11253 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Wenchen Fan > Assignee: Wenchen Fan > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org