Thanks Suraj! Good, another way is to select the groom server that has most blocks/parts of split(when splitSize > HDFS blocksize, split may has several HDFS blocks) to obtain maximum locality! Taking hostname in different rack is another good hint.
Micle Bu On Wed, Apr 17, 2013 at 11:29 PM, Suraj Menon <[email protected]>wrote: > Good catch! Yes the logic is to find the first groom server that has the > split and has available slots for execution. > You might note that depending on the HDFS allocation, this hostname might > not be in the same rack. You are welcome to fix this. > > > On Wed, Apr 17, 2013 at 11:15 AM, Micle Bu <[email protected]> wrote: > > > Hi all, > > > > I'm learning data locality in Hama, and found there is a > > BestEffortDataLocalTaskAllocator class for this purpose. It's a good idea > > to assign task to the groom which contains its split, > getGroomToSchedule() > > play this role. > > > > Well, in getGroomToSchedule(), the code like: > > > > GroomServerStatus groom = grooms.get(location); > > ... > > if (taskInGroom < groom.getMaxTasks() && > > location.equals(groom.getGroomHostName())) { > > return groom.getGroomHostName(); > > } > > > > It seems that location.equals(groom.getGroomHostName() is always true, so > > it just select the first groom which contains split? Am i right? > > > > Thanks in advance! > > > > Micle Bu > > >
