Re: Question about Parallel Stages in Spark

Pralabh Kumar Mon, 26 Jun 2017 20:39:40 -0700

Hi

I don't think so spark submit ,will receive two submits .  Its will execute
one submit and then to next one .  If the application is multithreaded ,and
two threads are calling spark submit and one time , then they will run
parallel provided the scheduler is FAIR and task slots are available .


But in one thread ,one submit will complete and then the another one will
start . If there are independent stages in one job, then those will run
parallel.

I agree with Bryan Jeffrey .


Regards
Pralabh Kumar

On Tue, Jun 27, 2017 at 9:03 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote:

> I think the spark cluster receives two submits, A and B.
> The FAIR  is used to schedule A and B.
> I am not sure about this.
>
> ---Original---
> *From:* "Bryan Jeffrey"<bryan.jeff...@gmail.com>
> *Date:* 2017/6/27 08:55:42
> *To:* "satishl"<satish.la...@gmail.com>;
> *Cc:* "user"<user@spark.apache.org>;
> *Subject:* Re: Question about Parallel Stages in Spark
>
> Hello.
>
> The driver is running the individual operations in series, but each
> operation is parallelized internally.  If you want them run in parallel you
> need to provide the driver a mechanism to thread the job scheduling out:
>
> val rdd1 = sc.parallelize(1 to 100000)
> val rdd2 = sc.parallelize(1 to 200000)
>
> var thingsToDo: ParArray[(RDD[Int], Int)] = Array(rdd1, rdd2).zipWithIndex.par
>
> thingsToDo.foreach { case(rdd, index) =>
>   for(i <- (1 to 10000))
>     logger.info(s"Index ${index} - ${rdd.sum()}")
> }
>
>
> This will run both operations in parallel.
>
>
> On Mon, Jun 26, 2017 at 8:10 PM, satishl <satish.la...@gmail.com> wrote:
>
>> For the below code, since rdd1 and rdd2 dont depend on each other - i was
>> expecting that both first and second printlns would be interwoven.
>> However -
>> the spark job runs all "first " statements first and then all "seocnd"
>> statements next in serial fashion. I have set spark.scheduler.mode = FAIR.
>> obviously my understanding of parallel stages is wrong. What am I missing?
>>
>>     val rdd1 = sc.parallelize(1 to 1000000)
>>     val rdd2 = sc.parallelize(1 to 1000000)
>>
>>     for (i <- (1 to 100))
>>       println("first: " + rdd1.sum())
>>     for (i <- (1 to 100))
>>       println("second" + rdd2.sum())
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Question-about-Parallel-Stages-in-
>> Spark-tp28793.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

Re: Question about Parallel Stages in Spark

Reply via email to