Devaraj Das a écrit :
>> It might also be a good idea to make getTasksToKill() return
directly it's set "killJobIDs", instead of copying that set
into a List and return that list. Or to even not use a Set,
if TaskTrackers drops silently commands of killings already
dead tasks.
The tasktrackers wouldn't know that it has to kill something unless
explicitly told about it (imagine that the user just fired a command to kill
a job, or the tasktracker is running a speculative task and another attempt
of the same just finished). I am not sure I understood you right though.
Sorry. I'm going to try to tell it in a better way :
The context is that JobTracker.getTasksToKill(taskTracker) go through
the list of Tasks that are associated with taskTracker, create a set
called killJobIds and fill it with some of those tasks. Then it copy the
content of killJobIds in a list called killList and returns that List.
The content of killList list is then copied into another list inside
JobTracker.heartbeat().
I suggest two changes :
1- Make JobTracker.getTasksToKill(taskTracker) return a Collection, and
make killJobIds that Collection, removing the need for copying its
content into the killList
2- Change the type of killJobIds from Set to ArrayList, since anyway it
cannot contain duplicate elements, because its element are extracted
from another set.
The two should reduce the number of allocation and the complexity.
The digression about the taskTracker's behaviour was a question about
whether it is important or not for killJobIds to not contain duplicates.
By the way, my patch in the issue HADOOP-3412 also tries
improve the way containers are used. It replaces
jobsByPriority (which were periodically resorted by
resortPriority and in an inefficient way) by a TreeSet. It
also replaces the TreeMap taskTrackers by a ConcurrentHashMap.
I don't know if it's feasible but allowing the JobTracker to
answer to more than one HeartBeat at the same time (by using
concurent containers to lower it's granularity) could be a
good idea. If you think it's feasible I'll try to do it ^^
Answering more than one heartbeat at the same time is interesting. Could you
pls elaborate on that. We sometime back were thinking of queuing up the
heartbeats and processing them asynchronously. Are you talking about the
same?
Yes. What I suggest is to make the "synchronized areas" smaller using
concurrent containers and then to use a ThreadPool to answer heartbeats.
If you think that it is possible, I'll try to do it.
Please forgive me for my english :-/ The next year I'll go to study in
Oregon, it should be better after that ^^
Brice