Manno15 commented on pull request #1888: URL: https://github.com/apache/accumulo/pull/1888#issuecomment-778491267
> The main issue we have found stems from the way the tservers are shutdown. This portion of the test often hangs between the time two tservers are shut down This seems to be the part that has the highest resource and performance dependency of the test. Where it will pass in intellij but fail in terminal. It does stand to reason that increasing the SuspendDuration will increase reliability. Less of a chance the tablets become unsuspended before the tservers are properly recovered. This will increase the amount of time the test will take since the latter part of each test does wait for the suspend duration to end to see if those tablets are properly unsuspended due to it running out (https://github.com/apache/accumulo/blob/87548d42c7bc02d567918f8333f0be9ed24698e8/test/src/main/java/org/apache/accumulo/test/manager/SuspendedTabletsIT.java#L248). To combat this, we would have to increase the timeout. I would suggest splitting that feature of the test up and having it in its own test. This way, we can have a longer duration for the initial issue mentioned above and then a table with a shorter SuspendDuration (since it is a per-table property I believe) for specifically testing that tablets get reassigned once that duration ends. This will hopefully make things more reliable without increasing the amount of time the test by too much. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
