JIRA creation is done: https://issues.apache.org/jira/browse/HBASE-17791
I will have a look into it over the next few days, maybe I can come up with a patch. On Wed, Mar 15, 2017 at 6:14 AM, Stack <[email protected]> wrote: > File a blocker please Lars. I'm pretty sure the boolean on whether we are > doing a recovery or not has been there a long time so yeah, a single server > recovery could throw us off, but you make a practical point, that one > server should not destroy locality over the cluster. > > St.Ack > > On Tue, Mar 14, 2017 at 7:09 AM, Lars George <[email protected]> wrote: > >> Wait, HBASE-15251 is not enough methinks. The checks added help, but >> are not covering all the possible edge cases. In particular, say a >> node really fails, why not just reassign the few regions it did hold >> and leave all the others where they are? Seems insane as it is. >> >> On Tue, Mar 14, 2017 at 2:24 PM, Lars George <[email protected]> >> wrote: >> > Doh, https://issues.apache.org/jira/browse/HBASE-15251 addresses this >> > (though I am not sure exactly how, see below). This should be >> > backported to all 1.x branches! >> > >> > As for the patch, I see this >> > >> > if (!failover) { >> > // Fresh cluster startup. >> > - LOG.info("Clean cluster startup. Assigning user regions"); >> > + LOG.info("Clean cluster startup. Don't reassign user regions"); >> > assignAllUserRegions(allRegions); >> > + } else { >> > + LOG.info("Failover! Reassign user regions"); >> > } >> > >> > Would be interesting to see how that log message is actually >> > reassigning the user regions. I would have assumed the >> > "assignAllUserRegions()" would have to be called. >> > >> > >> > On Tue, Mar 14, 2017 at 2:21 PM, Lars George <[email protected]> >> wrote: >> >> Looking at the code more... it seems the issue is here >> >> >> >> In AssignmentManager.processDeadServersAndRegionsInTransition(): >> >> >> >> ... >> >> failoverCleanupDone(); >> >> if (!failover) { >> >> // Fresh cluster startup. >> >> LOG.info("Clean cluster startup. Assigning user regions"); >> >> assignAllUserRegions(allRegions); >> >> } >> >> ... >> >> >> >> As soon as a single node failed, failover is set to true. So no >> >> assignment is done. Later on the servers are recovered from the >> >> "crash" (which a clean restart is as well), and then all unassigned >> >> regions (all of them now, since they are unassigned earlier) are >> >> assigned round-robin. >> >> >> >> >> >> >> >> On Tue, Mar 14, 2017 at 1:54 PM, Lars George <[email protected]> >> wrote: >> >>> Hi, >> >>> >> >>> I had this happened at multiple clusters recently where after the >> >>> restart the locality dropped from close to or exactly 100% down to >> >>> single digits. The reason is that all regions were completely shuffled >> >>> and reassigned to random servers. Upon reading the (yet again >> >>> non-trivial) assignment code, I found that a single server missing >> >>> will trigger a full "recovery" of all servers, which includes a drop >> >>> of all previous assignments and random new assignment. >> >>> >> >>> This is just terrible! In addition, I also assumed that - at least the >> >>> StochasticLoadBalancer - is checking which node holds most of the data >> >>> of a region locality wise and picks that server. But that is _not_ the >> >>> case! It just spreads everything seemingly randomly across the >> >>> servers. >> >>> >> >>> To me this is a big regression (or straight bug) given that a single >> >>> server out of, for example, hundreds could trigger that and destroy >> >>> the locality completely. Running a major compaction is not an approach >> >>> for many reasons. >> >>> >> >>> This used to work better, why that regression? >> >>> >> >>> Lars >>
