[
https://issues.apache.org/jira/browse/SPARK-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869880#comment-15869880
]
Jork Zijlstra edited comment on SPARK-19628 at 2/16/17 1:07 PM:
----------------------------------------------------------------
I have just attached a screenshot which contains duplicate jobs when executing
the above given example code.
The example code uses show(), but in our application we use collect(). Both
seem to trigger this duplication.
The issue is that both jobs take time (they are executed sequentially), so the
execution time has doubled for the same action.
was (Author: jzijlstra):
I have just attached a screenshot which contains duplicate jobs when executing
the above given example code.
The example code uses show(), but in our application we use collect(). Both
seem to trigger this duplication.
The issue is that both jobs take time, so the execution time has doubled for
the same action.
> Duplicate Spark jobs in 2.1.0
> -----------------------------
>
> Key: SPARK-19628
> URL: https://issues.apache.org/jira/browse/SPARK-19628
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.0
> Reporter: Jork Zijlstra
> Fix For: 2.0.1
>
> Attachments: spark2.0.1.png, spark2.1.0-examplecode.png,
> spark2.1.0.png
>
>
> After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs
> executed. Going back to Spark 2.0.1 they are gone again
> {code}
> import org.apache.spark.sql._
> object DoubleJobs {
> def main(args: Array[String]) {
> System.setProperty("hadoop.home.dir", "/tmp");
> val sparkSession: SparkSession = SparkSession.builder
> .master("local[4]")
> .appName("spark session example")
> .config("spark.driver.maxResultSize", "6G")
> .config("spark.sql.orc.filterPushdown", true)
> .config("spark.sql.hive.metastorePartitionPruning", true)
> .getOrCreate()
> sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> val paths = Seq(
> ""//some orc source
> )
> def dataFrame(path: String): DataFrame = {
> sparkSession.read.orc(path)
> }
> paths.foreach(path => {
> dataFrame(path).show(20)
> })
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]