[ 
https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869880#comment-15869880
 ] 

Jork Zijlstra commented on SPARK-19628:
---------------------------------------

I have just attached a screenshot which contains duplicate jobs when executing 
the above given example code. 

The example code uses show(), but in our application we use collect(). Both 
seem to trigger this duplication. 
The issue is that both jobs take time, so the execution time has doubled for 
the same action.

> Duplicate Spark jobs in 2.1.0
> -----------------------------
>
>                 Key: SPARK-19628
>                 URL: https://issues.apache.org/jira/browse/SPARK-19628
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Jork Zijlstra
>             Fix For: 2.0.1
>
>         Attachments: spark2.0.1.png, spark2.1.0-examplecode.png, 
> spark2.1.0.png
>
>
> After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs 
> executed. Going back to Spark 2.0.1 they are gone again
> {code}
> import org.apache.spark.sql._
> object DoubleJobs {
>   def main(args: Array[String]) {
>     System.setProperty("hadoop.home.dir", "/tmp");
>     val sparkSession: SparkSession = SparkSession.builder
>       .master("local[4]")
>       .appName("spark session example")
>       .config("spark.driver.maxResultSize", "6G")
>       .config("spark.sql.orc.filterPushdown", true)
>       .config("spark.sql.hive.metastorePartitionPruning", true)
>       .getOrCreate()
>     sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
>     val paths = Seq(
>       ""//some orc source
>     )
>     def dataFrame(path: String): DataFrame = {
>       sparkSession.read.orc(path)
>     }
>     paths.foreach(path => {
>       dataFrame(path).show(20)
>     })
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to