Instead of setting spark.locality.wait, try setting individual locality waits specifically.
Namely, spark.locality.wait.PROCESS_LOCAL to high value (so that process local tasks are always scheduled in case the task set has process local tasks). Set spark.locality.wait.NODE_LOCAL and spark.locality.wait.RACK_LOCAL to low value - so that in case task set has no process local tasks, both node local and rack local tasks are scheduled asap. >From your description, this will alleviate the problem you mentioned. Kay's comment, IMO, is slightly general in nature - and I suspect unless we overhaul how preferred locality is specified, and allow for taskset specific hints for schedule, we cant resolve that IMO. Regards, Mridul On Thu, Nov 13, 2014 at 1:25 PM, MaChong <machon...@sina.com> wrote: > Hi, > > We are running a time sensitive application with 70 partition and 800MB each > parition size. The application first load data from database in different > cluster, then apply a filter, cache the filted data, then apply a map and a > reduce, finally collect results. > The application will be finished in 20 seconds if we set spark.locality.wait > to a large value (30 minutes). And it will use 100 seconds, if we set > spark.locality.wait a small value(less than 10 seconds) > We have analysed the driver log and found lot of NODE_LOCAL and RACK_LOCAL > level tasks, normally a PROCESS_LOCAL task only takes 15 seconds, but > NODE_LOCAL or RACK_LOCAL tasks will take 70 seconds. > > So I think we'd better set spark.locality.wait to a large value(30 minutes), > until we meet this problem: > > Now our application will load data from hdfs in the same spark cluster, it > will get NODE_LOCAL and RACK_LOCAL level tasks during loading stage, if the > tasks in loading stage have same locality level, ether NODE_LOCAL or > RACK_LOCAL it works fine. > But if the tasks in loading stage get mixed locality level, such as 3 > NODE_LOCAL tasks, and 2 RACK_LOCAL tasks, then the TaskSetManager of loading > stage will submit the 3 NODE_LOCAL tasks as soon as resources were offered, > then wait for spark.locality.wait.node, which was setted to 30 minutes, the 2 > RACK_LOCAL tasks will wait 30 minutes even though resources are avaliable. > > > Does any one have met this problem? Do you have a nice solution? > > > Thanks > > > > > Ma chong --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org