Vivek Ratan a écrit :
The JT has a number of Map member variables, and I noticed that it uses
TreeMaps for most, if not all, of them. I also noticed that these member
variables are pretty much used for 'puts' and 'gets'. Given that there
is no need for sorted iteration, and the JT doesn't even iterate over
any of these maps, shouldn't it be better to use HashMaps?
We might also want to turn the taskidToTrackerMap into a HashMap<String,
Vector<TaskAttemptID>>. Given that the maximum number of TaskAttemptID
per TaskTracker is very low, it may allow us to be faster and to use
less memory (even if the asymptotic complexity would be greater).
It might also be a good idea to make getTasksToKill() return directly
it's set "killJobIDs", instead of copying that set into a List and
return that list. Or to even not use a Set, if TaskTrackers drops
silently commands of killings already dead tasks.
By the way, my patch in the issue HADOOP-3412 also tries improve the way
containers are used. It replaces jobsByPriority (which were periodically
resorted by resortPriority and in an inefficient way) by a TreeSet. It
also replaces the TreeMap taskTrackers by a ConcurrentHashMap.
I don't know if it's feasible but allowing the JobTracker to answer to
more than one HeartBeat at the same time (by using concurent containers
to lower it's granularity) could be a good idea. If you think it's
feasible I'll try to do it ^^
Brice
PS: Usual warnings about my use of English applies here :-P