DomGarguilo commented on issue #1791:
URL: https://github.com/apache/accumulo/issues/1791#issuecomment-767109627


   >     * the suspend time elapsed and the tablets are free to migrate before 
we are able to check (possibly because the tserver took too long to restart or 
WAL recovery on tablet load is causing the suspend time to elapse before we 
check the tablet states)
   It seems like the suspend time elapsing could very well be the cause of this 
issue. I don't think that the time taken for the tserver to restart has bearing 
on this issue because the inconsistent tablet locations are recorded before 
tservers are restarted.
   >     * the tserver fully recovered and then participated in subsequent 
migrations
   I do not think this can be the case because the tservers are not restarted 
until after the issue-causing assert occurs. This error happens between the 
time the pre-death tablet location are noted and the point at which the 
suspended tablets are gathered, and then their tablets are compared to their 
pre-suspend location.
   >     * we're splitting tablets and creating new migrations (possibly 
because splits weren't stabilized before killing the tserver)
   I'm not sure that this can be the case because the splits happen near the 
start of the test, then time is allowed for migrations and such to finish. A 
printout of the tablet locations after balancing consistently shows the 3 
servers used in this test with 10 tablets on each which seems balanced.
   >     * the balancer is misbehaving, and rebalancing when it isn't supposed 
to
   Seems very likely but I am having trouble looking into this.
   >     * they migrated before we killed the tserver, but we didn't record it 
correctly in the test
   This could be the case but does not seem likely. There is a brief window of 
time between when the tablet locations are recorded and the tservers are killed 
however I have tried allowing time before this happens for migrations to finish 
and it does not seem to increase the reliability of this test.
   
   Hopefully I did not misinterpret any of your points, @ctubbsii. I'm also not 
sure why the formatting is messed up with quoting the bullets.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to