Hi Imran,

Ok, that makes sense for performance reasons. Thanks for bearing with
me and explaining that code with so much patience. Appreciated!

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jan 26, 2017 at 11:00 PM, Imran Rashid <iras...@cloudera.com> wrote:
> it is a small difference but think about what this means with a cluster
> where you have 10k tasks (perhaps 1k executors with 10 cores each).
>
> When you have one task complete, you have to go through 1k more executors.
>
> On top of that, with a large cluster, task completions happen far more
> frequently, since each core in your cluster is finishing tasks
> independently, and sending those updates back to the driver -- eg., you
> expect to get 10k updates from one "wave" of tasks on your cluster.  So you
> avoid going through a list of 1k executors 10k times in just one wave of
> tasks.
>
> On Thu, Jan 26, 2017 at 9:12 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Hi Imran,
>>
>> Thanks a lot for your detailed explanation, but IMHO the difference is
>> so small that I'm surprised it merits two versions -- both check
>> whether an executor is alive -- executorIsAlive(executorId) vs
>> executorDataMap.filterKeys(executorIsAlive) A bit fishy, isn't it?
>>
>> But, on the other hand, since no one has considered it a small
>> duplication it could be perfectly fine (it did make the code a bit
>> less obvious to me).
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Thu, Jan 26, 2017 at 3:43 PM, Imran Rashid <iras...@cloudera.com>
>> wrote:
>> > one is used when exactly one task has finished -- that means you now
>> > have
>> > free resources on just that one executor, so you only need to look for
>> > something to schedule on that one.
>> >
>> > the other one is used when you want to schedule everything you can
>> > across
>> > the entire cluster.  For example, you have just submitted a new taskset,
>> > so
>> > you want to try to use any idle resources across the entire cluster.
>> > Or,
>> > for delay scheduling, you periodically retry all idle resources, in case
>> > they locality delay has expired.
>> >
>> > you could eliminate the version which takes an executorId, and always
>> > make
>> > offers across all idle hosts -- it would still be correct.  Its a small
>> > efficiency improvement to avoid having to go through the list of all
>> > resources.
>> >
>> > On Thu, Jan 26, 2017 at 5:48 AM, Jacek Laskowski <ja...@japila.pl>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> Why are there two (almost) identical makeOffers in
>> >> CoarseGrainedSchedulerBackend [1] and [2]? I can't seem to figure out
>> >> why they are there and am leaning towards considering one a duplicate.
>> >>
>> >> WDYT?
>> >>
>> >> [1]
>> >>
>> >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L211
>> >>
>> >> [2]
>> >>
>> >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L229
>> >>
>> >> Pozdrawiam,
>> >> Jacek Laskowski
>> >> ----
>> >> https://medium.com/@jaceklaskowski/
>> >> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
>> >> Follow me at https://twitter.com/jaceklaskowski
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to