Re: using TreeMaps in JobTracker

Brice Arnould Mon, 09 Jun 2008 05:15:25 -0700

Devaraj Das a écrit :
 >> It might also be a good idea to make getTasksToKill() return

directly it's set "killJobIDs", instead of copying that setinto a List and return that list. Or to even not use a Set,if TaskTrackers drops silently commands of killings alreadydead tasks.
The tasktrackers wouldn't know that it has to kill something unless
explicitly told about it (imagine that the user just fired a command to kill
a job, or the tasktracker is running a speculative task and another attempt
of the same just finished). I am not sure I understood you right though.

Sorry. I'm going to try to tell it in a better way :

The context is that JobTracker.getTasksToKill(taskTracker) go throughthe list of Tasks that are associated with taskTracker, create a setcalled killJobIds and fill it with some of those tasks. Then it copy thecontent of killJobIds in a list called killList and returns that List.The content of killList list is then copied into another list insideJobTracker.heartbeat().


I suggest two changes :

1- Make JobTracker.getTasksToKill(taskTracker) return a Collection, andmake killJobIds that Collection, removing the need for copying itscontent into the killList2- Change the type of killJobIds from Set to ArrayList, since anyway itcannot contain duplicate elements, because its element are extractedfrom another set.


The two should reduce the number of allocation and the complexity.

The digression about the taskTracker's behaviour was a question aboutwhether it is important or not for killJobIds to not contain duplicates.

By the way, my patch in the issue HADOOP-3412 also triesimprove the way containers are used. It replacesjobsByPriority (which were periodically resorted byresortPriority and in an inefficient way) by a TreeSet. Italso replaces the TreeMap taskTrackers by a ConcurrentHashMap.I don't know if it's feasible but allowing the JobTracker toanswer to more than one HeartBeat at the same time (by usingconcurent containers to lower it's granularity) could be agood idea. If you think it's feasible I'll try to do it ^^
Answering more than one heartbeat at the same time is interesting. Could you
pls elaborate on that. We sometime back were thinking of queuing up the
heartbeats and processing them asynchronously. Are you talking about the
same?

Yes. What I suggest is to make the "synchronized areas" smaller usingconcurrent containers and then to use a ThreadPool to answer heartbeats.

If you think that it is possible, I'll try to do it.

Please forgive me for my english :-/ The next year I'll go to study inOregon, it should be better after that ^^


Brice

Re: using TreeMaps in JobTracker

Reply via email to