Marcelo Vanzin created SPARK-26330: -------------------------------------- Summary: Duplicate query execution events generated for SQL commands Key: SPARK-26330 URL: https://issues.apache.org/jira/browse/SPARK-26330 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Marcelo Vanzin
Consider the following code: {code:java} spark.sql("create table foo (bar int)").show() {code} The command is executed eagerly (i.e. before {{show()}} is called) and generates a query execution event. But when you call {{show()}}, a duplicate event is generated, even though Spark does not execute anything at that point. This can be a little more misleading when you do something like a CTAS, since the duplicate events may cause listeners to think there were multiple inserts when that's not true. A fuller example that shows this (and you can look at the output that both inputs to the listener are the same): {code:java} import org.apache.spark.sql.execution.QueryExecution import org.apache.spark.sql.util.QueryExecutionListener val lsnr = new QueryExecutionListener() { override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = { println(s"on success: $funcName -> ${qe.analyzed}") } override def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit = { println(s"on failure: $funcName -> ${qe.analyzed}") } } spark.sessionState.listenerManager.register(lsnr) spark.sql("drop table if exists test") val df = spark.sql("create table test(i int)") df.show() {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org