ctubbsii commented on issue #1791:
URL: https://github.com/apache/accumulo/issues/1791#issuecomment-767294128


   Hmm, based on where it's failing, it seems like it's possible that not all 
tablets are yet marked as suspended. The loop condition on line 213 only waits 
until there's *some* suspended tablets for at least two tservers that were 
killed. However, it doesn't loop until *all* tablets hosted on those tservers 
are marked as suspended, which is the check on line 225 that is failing.
   
   The comment above line 225 says "All suspended tablets should "belong" to 
the dead tablet servers". However, it's not actually checking for that... what 
it's checking for is that "all tablets belonging to the dead tablet servers are 
now suspended"... but that's not a condition that we waited for in the loop at 
line 213.
   
   We can probably wait for all tablets assigned to the dead tserver to be 
marked as suspended OR we can relax the check on line 225 to only verify what 
is in the comment, and not assume that all of them that had been assigned to 
that tserver have been suspended yet.
   
   This is just a hunch, though, that not all the tablets have yet been marked 
as suspended. You could verify this hunch by checking the state of the other 
tablets that were assigned to the dead tserver when the equality check on line 
225 would have failed (something like replacing `assertEquals` with `if not 
equals, then dump ds.hosted and fail()`).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to