I think the spark cluster receives two submits, A and B. The FAIR is used to schedule A and B. I am not sure about this.
---Original--- From: "Bryan Jeffrey"<bryan.jeff...@gmail.com> Date: 2017/6/27 08:55:42 To: "satishl"<satish.la...@gmail.com>; Cc: "user"<user@spark.apache.org>; Subject: Re: Question about Parallel Stages in Spark Hello. The driver is running the individual operations in series, but each operation is parallelized internally. If you want them run in parallel you need to provide the driver a mechanism to thread the job scheduling out: val rdd1 = sc.parallelize(1 to 100000) val rdd2 = sc.parallelize(1 to 200000) var thingsToDo: ParArray[(RDD[Int], Int)] = Array(rdd1, rdd2).zipWithIndex.par thingsToDo.foreach { case(rdd, index) => for(i <- (1 to 10000)) logger.info(s"Index ${index} - ${rdd.sum()}") } This will run both operations in parallel. On Mon, Jun 26, 2017 at 8:10 PM, satishl <satish.la...@gmail.com> wrote: For the below code, since rdd1 and rdd2 dont depend on each other - i was expecting that both first and second printlns would be interwoven. However - the spark job runs all "first " statements first and then all "seocnd" statements next in serial fashion. I have set spark.scheduler.mode = FAIR. obviously my understanding of parallel stages is wrong. What am I missing? val rdd1 = sc.parallelize(1 to 1000000) val rdd2 = sc.parallelize(1 to 1000000) for (i <- (1 to 100)) println("first: " + rdd1.sum()) for (i <- (1 to 100)) println("second" + rdd2.sum()) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Question-about-Parallel-Stages-in-Spark-tp28793.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org