Lars: Did you forget to attach the log (pastebin) ? I assume this scenario could happen in 0.20.6 as well.
On Thu, Nov 25, 2010 at 2:49 AM, Lars George <lars.geo...@gmail.com> wrote: > Hi, > > Talking to Lars we found a weird series of events that led to dying > task attempts during the bulk job they are running. See the attached > log. Region in question is: > > > raw_occurrence_record,28256992928,1290626317732.86e3cd5c5d0f22430debb36f1668d3fc > > After a split the region is assigned to a new server (c1n6) but then > immediately ripped out underneath it because the server is deemed to > be too loaded. Then it releases that region (after a few minutes) and > a few others and then that new daughter region, which did not have > time yet to serve anything really, is reassigned to.... you guessed it > probably, the same server as that is now deemed (?) to be the least > loaded one. So that new daughter is not available for minutes causing > the task attempt to die. > > I see that in trunk this is all replaced. No RegionManager but an > AssignmentManager who does random node selection. Is that all obsolete > now and this should be improved on trunk? > > Lars >