On Sun, Sep 20, 2015 at 3:58 PM, gsvic <victora...@gmail.com> wrote: > Concerning answers 1 and 2: > > 1) How Spark determines a node as a "slow node" and how slow is that? >
There are two cases here: 1. If a node is busy (e.g. all slots are already occupied), the scheduler cannot schedule anything on it. See "Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling" paper for how locality scheduling is done. 2. Within the same stage, if a task is slower than other tasks, a copy of it can be launched speculatively in order to mitigate stragglers. Search for speculation in the code base to find out more. > 2) How an RDD chooses a location as a preferred location and with which > criteria? > This is part of the RDD definition. The RDD interface itself defines locality. The Spark NSDI paper already talks about this. Why don't you just do a little bit of code reading yourself? > > Could you please also include the links of the source files for the two > questions above? > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Execution-and-Scheduling-tp14177p14226.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >