I think the spark cluster receives two submits, A and B.
The FAIR  is used to schedule A and B.
I am not sure about this.


 
---Original---
From: "Bryan Jeffrey"<bryan.jeff...@gmail.com>
Date: 2017/6/27 08:55:42
To: "satishl"<satish.la...@gmail.com>;
Cc: "user"<user@spark.apache.org>;
Subject: Re: Question about Parallel Stages in Spark


Hello.

The driver is running the individual operations in series, but each operation 
is parallelized internally.  If you want them run in parallel you need to 
provide the driver a mechanism to thread the job scheduling out:


val rdd1 = sc.parallelize(1 to 100000)
val rdd2 = sc.parallelize(1 to 200000)

var thingsToDo: ParArray[(RDD[Int], Int)] = Array(rdd1, rdd2).zipWithIndex.par

thingsToDo.foreach { case(rdd, index) =>
  for(i <- (1 to 10000))
    logger.info(s"Index ${index} - ${rdd.sum()}")
}


This will run both operations in parallel.




On Mon, Jun 26, 2017 at 8:10 PM, satishl <satish.la...@gmail.com> wrote:
For the below code, since rdd1 and rdd2 dont depend on each other - i was
 expecting that both first and second printlns would be interwoven. However -
 the spark job runs all "first " statements first and then all "seocnd"
 statements next in serial fashion. I have set spark.scheduler.mode = FAIR.
 obviously my understanding of parallel stages is wrong. What am I missing?
 
     val rdd1 = sc.parallelize(1 to 1000000)
     val rdd2 = sc.parallelize(1 to 1000000)
 
     for (i <- (1 to 100))
       println("first: " + rdd1.sum())
     for (i <- (1 to 100))
       println("second" + rdd2.sum())
 
 
 
 --
 View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Question-about-Parallel-Stages-in-Spark-tp28793.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 ---------------------------------------------------------------------
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to