Hi, Talking to Lars we found a weird series of events that led to dying task attempts during the bulk job they are running. See the attached log. Region in question is:
raw_occurrence_record,28256992928,1290626317732.86e3cd5c5d0f22430debb36f1668d3fc After a split the region is assigned to a new server (c1n6) but then immediately ripped out underneath it because the server is deemed to be too loaded. Then it releases that region (after a few minutes) and a few others and then that new daughter region, which did not have time yet to serve anything really, is reassigned to.... you guessed it probably, the same server as that is now deemed (?) to be the least loaded one. So that new daughter is not available for minutes causing the task attempt to die. I see that in trunk this is all replaced. No RegionManager but an AssignmentManager who does random node selection. Is that all obsolete now and this should be improved on trunk? Lars