For the below code, since rdd1 and rdd2 dont depend on each other - i was
expecting that both first and second printlns would be interwoven. However -
the spark job runs all "first " statements first and then all "seocnd"
statements next in serial fashion. I have set spark.scheduler.mode = FAIR. 
obviously my understanding of parallel stages is wrong. What am I missing?

    val rdd1 = sc.parallelize(1 to 1000000)
    val rdd2 = sc.parallelize(1 to 1000000)

    for (i <- (1 to 100))
      println("first: " + rdd1.sum())
    for (i <- (1 to 100))
      println("second" + rdd2.sum())



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Question-about-Parallel-Stages-in-Spark-tp28793.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to