Re: using TreeMaps in JobTracker

Brice Arnould Mon, 09 Jun 2008 02:46:57 -0700

Vivek Ratan a écrit :

The JT has a number of Map member variables, and I noticed that it uses
TreeMaps for most, if not all, of them. I also noticed that these member
variables are pretty much used for 'puts' and 'gets'. Given that there
is no need for sorted iteration, and the JT doesn't even iterate over

any of these maps, shouldn't it be better to use HashMaps?

We might also want to turn the taskidToTrackerMap into a HashMap<String,Vector<TaskAttemptID>>. Given that the maximum number of TaskAttemptIDper TaskTracker is very low, it may allow us to be faster and to useless memory (even if the asymptotic complexity would be greater).

It might also be a good idea to make getTasksToKill() return directlyit's set "killJobIDs", instead of copying that set into a List andreturn that list. Or to even not use a Set, if TaskTrackers dropssilently commands of killings already dead tasks.

By the way, my patch in the issue HADOOP-3412 also tries improve the waycontainers are used. It replaces jobsByPriority (which were periodicallyresorted by resortPriority and in an inefficient way) by a TreeSet. Italso replaces the TreeMap taskTrackers by a ConcurrentHashMap.I don't know if it's feasible but allowing the JobTracker to answer tomore than one HeartBeat at the same time (by using concurent containersto lower it's granularity) could be a good idea. If you think it'sfeasible I'll try to do it ^^


Brice

PS: Usual warnings about my use of English applies here :-P

Re: using TreeMaps in JobTracker

Reply via email to