Hi,

We are running a time sensitive application with 70 partition and 800MB each 
parition size. The application first load data from database in different 
cluster, then apply a filter, cache the filted data, then apply a map and a 
reduce, finally collect results.
The application will be finished in 20 seconds if we set spark.locality.wait to 
a large value (30 minutes). And it will use 100 seconds, if we set 
spark.locality.wait a small value(less than 10 seconds) 
We have analysed the driver log and found lot of NODE_LOCAL and RACK_LOCAL 
level tasks, normally a PROCESS_LOCAL task only takes 15 seconds, but 
NODE_LOCAL or RACK_LOCAL tasks will take 70 seconds.

So I think we'd better set spark.locality.wait to a large value(30 minutes), 
until we meet this problem: 

Now our application will load data from hdfs in the same spark cluster, it will 
get NODE_LOCAL and RACK_LOCAL level tasks during loading stage, if the tasks in 
loading stage have same locality level, ether NODE_LOCAL or RACK_LOCAL it works 
fine. 
But if the tasks in loading stage get mixed locality level, such as 3 
NODE_LOCAL tasks, and 2 RACK_LOCAL tasks, then the TaskSetManager of loading 
stage will submit the 3 NODE_LOCAL tasks as soon as resources were offered, 
then wait for spark.locality.wait.node, which was setted to 30 minutes, the 2 
RACK_LOCAL tasks will wait 30 minutes even though resources are avaliable.


Does any one have met this problem? Do you have a nice solution?


Thanks




Ma chong

Reply via email to