[ 
https://issues.apache.org/jira/browse/SPARK-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973786#comment-14973786
 ] 

Wenchen Fan commented on SPARK-11253:
-------------------------------------

There is one more issue about the SQL metric:

We use -1 as initial size of the "dataSize" and "spillSize" metrics, to work 
around https://issues.apache.org/jira/browse/SPARK-11013.
For accumulators in physical plan, we only have a final aggregated 
value(currently we only have `sum`), which is different from SQLListener(has 
values for every task). So the -1 will affect the final aggregated values.

Code to confirm it:
{code}
    val metrics = ArrayBuffer.empty[Long]
    val listener = new QueryExecutionListener {
      override def onFailure(funcName: String, qe: QueryExecution, exception: 
Exception): Unit = {}

      override def onSuccess(funcName: String, qe: QueryExecution, duration: 
Long): Unit = {
        metrics += qe.executedPlan.longMetric("dataSize").value.value
        val bottomAgg = qe.executedPlan.children(0).children(0)
        metrics += bottomAgg.longMetric("dataSize").value.value
      }
    }
    sqlContext.listenerManager.register(listener)

    withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "2") {
      val df = sqlContext.sparkContext.makeRDD(Seq(1 -> "a", 2 -> "b"), 
2).toDF("i", "j").groupBy("i").count()
      df.collect()
      assert(metrics(0) - metrics(1) == 2)
    }
{code}

But the impaction is quite small, we can fix it later.

> reset all accumulators in physical operators before execute an action
> ---------------------------------------------------------------------
>
>                 Key: SPARK-11253
>                 URL: https://issues.apache.org/jira/browse/SPARK-11253
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>             Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to