DomGarguilo commented on issue #1791: URL: https://github.com/apache/accumulo/issues/1791#issuecomment-767109627
> * the suspend time elapsed and the tablets are free to migrate before we are able to check (possibly because the tserver took too long to restart or WAL recovery on tablet load is causing the suspend time to elapse before we check the tablet states) It seems like the suspend time elapsing could very well be the cause of this issue. I don't think that the time taken for the tserver to restart has bearing on this issue because the inconsistent tablet locations are recorded before tservers are restarted. > * the tserver fully recovered and then participated in subsequent migrations I do not think this can be the case because the tservers are not restarted until after the issue-causing assert occurs. This error happens between the time the pre-death tablet location are noted and the point at which the suspended tablets are gathered, and then their tablets are compared to their pre-suspend location. > * we're splitting tablets and creating new migrations (possibly because splits weren't stabilized before killing the tserver) I'm not sure that this can be the case because the splits happen near the start of the test, then time is allowed for migrations and such to finish. A printout of the tablet locations after balancing consistently shows the 3 servers used in this test with 10 tablets on each which seems balanced. > * the balancer is misbehaving, and rebalancing when it isn't supposed to Seems very likely but I am having trouble looking into this. > * they migrated before we killed the tserver, but we didn't record it correctly in the test This could be the case but does not seem likely. There is a brief window of time between when the tablet locations are recorded and the tservers are killed however I have tried allowing time before this happens for migrations to finish and it does not seem to increase the reliability of this test. Hopefully I did not misinterpret any of your points, @ctubbsii. I'm also not sure why the formatting is messed up with quoting the bullets. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
