i think my words also misunderstood. My point is they will not submit
together since they are the part of one thread.

val spark =  SparkSession.builder()
  .appName("practice")
  .config("spark.scheduler.mode","FAIR")
  .enableHiveSupport().getOrCreate()
val sc = spark.sparkContext
sc.parallelize(List(1.to(10000000))).map(s=>Thread.sleep(10000)).collect()
sc.parallelize(List(1.to(10000000))).map(s=>Thread.sleep(10000)).collect()
Thread.sleep(10000000)


I ran this and both spark submit time are different for both the jobs .

Please let me if I am wrong

On Tue, Jun 27, 2017 at 9:17 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote:

> My words cause misunderstanding.
> Step 1:A is submited to spark.
> Step 2:B is submitted to spark.
>
> Spark gets two independent jobs.The FAIR  is used to schedule A and B.
>
> Jeffrey' code did not cause two submit.
>
>
>
> ---Original---
> *From:* "Pralabh Kumar"<pralabhku...@gmail.com>
> *Date:* 2017/6/27 12:09:27
> *To:* "萝卜丝炒饭"<1427357...@qq.com>;
> *Cc:* "user"<user@spark.apache.org>;"satishl"<satish.la...@gmail.com>;"Bryan
> Jeffrey"<bryan.jeff...@gmail.com>;
> *Subject:* Re: Question about Parallel Stages in Spark
>
> Hi
>
> I don't think so spark submit ,will receive two submits .  Its will
> execute one submit and then to next one .  If the application is
> multithreaded ,and two threads are calling spark submit and one time , then
> they will run parallel provided the scheduler is FAIR and task slots are
> available .
>
> But in one thread ,one submit will complete and then the another one will
> start . If there are independent stages in one job, then those will run
> parallel.
>
> I agree with Bryan Jeffrey .
>
>
> Regards
> Pralabh Kumar
>
> On Tue, Jun 27, 2017 at 9:03 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote:
>
>> I think the spark cluster receives two submits, A and B.
>> The FAIR  is used to schedule A and B.
>> I am not sure about this.
>>
>> ---Original---
>> *From:* "Bryan Jeffrey"<bryan.jeff...@gmail.com>
>> *Date:* 2017/6/27 08:55:42
>> *To:* "satishl"<satish.la...@gmail.com>;
>> *Cc:* "user"<user@spark.apache.org>;
>> *Subject:* Re: Question about Parallel Stages in Spark
>>
>> Hello.
>>
>> The driver is running the individual operations in series, but each
>> operation is parallelized internally.  If you want them run in parallel you
>> need to provide the driver a mechanism to thread the job scheduling out:
>>
>> val rdd1 = sc.parallelize(1 to 100000)
>> val rdd2 = sc.parallelize(1 to 200000)
>>
>> var thingsToDo: ParArray[(RDD[Int], Int)] = Array(rdd1, 
>> rdd2).zipWithIndex.par
>>
>> thingsToDo.foreach { case(rdd, index) =>
>>   for(i <- (1 to 10000))
>>     logger.info(s"Index ${index} - ${rdd.sum()}")
>> }
>>
>>
>> This will run both operations in parallel.
>>
>>
>> On Mon, Jun 26, 2017 at 8:10 PM, satishl <satish.la...@gmail.com> wrote:
>>
>>> For the below code, since rdd1 and rdd2 dont depend on each other - i was
>>> expecting that both first and second printlns would be interwoven.
>>> However -
>>> the spark job runs all "first " statements first and then all "seocnd"
>>> statements next in serial fashion. I have set spark.scheduler.mode =
>>> FAIR.
>>> obviously my understanding of parallel stages is wrong. What am I
>>> missing?
>>>
>>>     val rdd1 = sc.parallelize(1 to 1000000)
>>>     val rdd2 = sc.parallelize(1 to 1000000)
>>>
>>>     for (i <- (1 to 100))
>>>       println("first: " + rdd1.sum())
>>>     for (i <- (1 to 100))
>>>       println("second" + rdd2.sum())
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Question-about-Parallel-Stages-in-Spar
>>> k-tp28793.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>

Reply via email to