Hi Imran, Ok, that makes sense for performance reasons. Thanks for bearing with me and explaining that code with so much patience. Appreciated!
Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Jan 26, 2017 at 11:00 PM, Imran Rashid <iras...@cloudera.com> wrote: > it is a small difference but think about what this means with a cluster > where you have 10k tasks (perhaps 1k executors with 10 cores each). > > When you have one task complete, you have to go through 1k more executors. > > On top of that, with a large cluster, task completions happen far more > frequently, since each core in your cluster is finishing tasks > independently, and sending those updates back to the driver -- eg., you > expect to get 10k updates from one "wave" of tasks on your cluster. So you > avoid going through a list of 1k executors 10k times in just one wave of > tasks. > > On Thu, Jan 26, 2017 at 9:12 AM, Jacek Laskowski <ja...@japila.pl> wrote: >> >> Hi Imran, >> >> Thanks a lot for your detailed explanation, but IMHO the difference is >> so small that I'm surprised it merits two versions -- both check >> whether an executor is alive -- executorIsAlive(executorId) vs >> executorDataMap.filterKeys(executorIsAlive) A bit fishy, isn't it? >> >> But, on the other hand, since no one has considered it a small >> duplication it could be perfectly fine (it did make the code a bit >> less obvious to me). >> >> Pozdrawiam, >> Jacek Laskowski >> ---- >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >> On Thu, Jan 26, 2017 at 3:43 PM, Imran Rashid <iras...@cloudera.com> >> wrote: >> > one is used when exactly one task has finished -- that means you now >> > have >> > free resources on just that one executor, so you only need to look for >> > something to schedule on that one. >> > >> > the other one is used when you want to schedule everything you can >> > across >> > the entire cluster. For example, you have just submitted a new taskset, >> > so >> > you want to try to use any idle resources across the entire cluster. >> > Or, >> > for delay scheduling, you periodically retry all idle resources, in case >> > they locality delay has expired. >> > >> > you could eliminate the version which takes an executorId, and always >> > make >> > offers across all idle hosts -- it would still be correct. Its a small >> > efficiency improvement to avoid having to go through the list of all >> > resources. >> > >> > On Thu, Jan 26, 2017 at 5:48 AM, Jacek Laskowski <ja...@japila.pl> >> > wrote: >> >> >> >> Hi, >> >> >> >> Why are there two (almost) identical makeOffers in >> >> CoarseGrainedSchedulerBackend [1] and [2]? I can't seem to figure out >> >> why they are there and am leaning towards considering one a duplicate. >> >> >> >> WDYT? >> >> >> >> [1] >> >> >> >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L211 >> >> >> >> [2] >> >> >> >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L229 >> >> >> >> Pozdrawiam, >> >> Jacek Laskowski >> >> ---- >> >> https://medium.com/@jaceklaskowski/ >> >> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark >> >> Follow me at https://twitter.com/jaceklaskowski >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> > > > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org