ctubbsii commented on a change in pull request #1888:
URL: https://github.com/apache/accumulo/pull/1888#discussion_r579480765
##########
File path:
test/src/main/java/org/apache/accumulo/test/manager/SuspendedTabletsIT.java
##########
@@ -117,7 +117,12 @@ public void shutdownAndResumeTserver() throws Exception {
suspensionTestBody((ctx, locs, count) -> {
Set<TServerInstance> tserversSet = new HashSet<>();
for (TabletLocationState tls : locs.locationStates.values()) {
- if (tls.current != null) {
+
+ TabletLocator tl = TabletLocator.getLocator(ctx, MetadataTable.ID);
+ String metadataTserver =
+ tl.locateTablet(ctx, tls.extent.toMetaRow(), false,
false).tablet_location;
+ // if the server does not hold the metadata, add it to the list to be
shutdown
+ if (tls.current != null &&
!tls.current.toString().startsWith(metadataTserver)) {
Review comment:
> > 1. the tserver could still be added to the list if another tablet
adds it, so it could still cause problems, and
>
> I don't see how this could happen. The name of the tserver with the
metadata is compared with the tserver that the current tablet is on. Therefore
any tablet on that server should yield the same result.
>
> > 1. this could result in no tservers shutting down, therefore not
actually checking the conditions the test is intended to check
>
> This would only happen if the metadata was not on any of the servers. I'm
not sure this could happen since the metadata would need to be on one of the
available servers from my understanding.
>
My understanding is that it is looping through all tablet locations, each
assigned to `tls` as it loops. For each tablet (`tls.extent`), it tries to find
the location of the metadata tablet that is hosting that tablet's metadata
(locating `tls.extent.toMetaRow()`). It will *not* add the tserver, if the
location is the same as the current location.
However, consider tablets `A` and `B`, whose metadata tablets are in `m(A)`
and `m(B)`:
```
tserver1 contains A, B, and m(A)
tserver2 contains m(B)
```
As you go through this loop, we see tablet `A` first, but since `m(A)` and
`A` are both hosted on `tserver1`, it is not added to the list to shut down.
Next, going through the loop, we reach tablet `B`, but since `m(B)` is hosted
on `tserver2`, then `tserver1` is added to the list to shut down.
`tserver1` is still added to the list to shut down at the end of the loop,
not by tablet `A`, but by tablet `B`. To get your strategy to work, you would
have to add *all* tservers whose metadata tablets were hosted on the same
tserver to a block list, and then do `tserversSet.removeAll(noShutdownList)` at
the end of this loop. However, you could easily get into a situation where all
tservers are shut down (`tserver1 = {A, B}, tserver2 = {m(A), m(B)}`), or none
are (`tserver1 = {A, m(A)}, tserver2 = {B, m(B)}`).
> It may be a good idea to add an assert to make sure two servers are on
track to be shutdown.
I thought this test only ran two total tservers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]